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Introduction: 

Primary  inflammatory  breast  cancer  (IBC)  accounts  for  approximately  3%  of  new  breast  cancers 
in  the  US.  This  form  of  locally  advanced  breast  cancer  is  characterized  clinically  by  erythema, 
warmth,  and  dimpling  of  the  skin  that  arise  rapidly,  typically  within  six  months.  IBC  is  generally 
not  associated  with  precursor  lesions  and  is  rapidly  invasive  from  the  outset,  especially  to  the 
skin  and  lymphatics,  and  is  highly  angiogenic  and  metastatic.  Because  of  this  disease’s  rapid 
progression,  the  effectiveness  of  aggressive  multimodality  treatment  is  limited;  the  5-year 
disease-free,  mean  survival  rate  is  less  than  45%,  making  IBC  the  most  lethal  form  of  breast 
cancer  (1).  This  rapid  progression  is  due  to  the  development  of  distant  metastases,  indicating  that 
the  tumors  quickly  acquire  the  ability  to  invade  and  metastasize  during  tumor  development.  This 
suggests  that  the  unique  aggressive  inflammatory  phenotype  of  IBC  is  the  result  of  a  limited 
number  of  concordant  genetic  alterations.  As  such,  IBC  constitutes  an  excellent  paradigm  to 
understand  aggressive  phenotypes  in  breast  cancer.  Previously,  our  laboratory  has  found 
concordant  and  consistent  overexpression  and  of  RhoC  GTPase  in  tissue  samples 
from  patients  with  IBC  as  compared  to  stage-matched  non-IBC  (2,  3).  We  have  also 
demonstrated  that  RhoC  GTPase  occupies  an  integral  role  in  the  aggressive  phenotype  of  IBC  (4, 
5).  With  the  increasing  evidence  that  RhoC  and  other  ras-homology  family  proteins  play  a 
significant  role  in  other  cancers  (6,  7),  the  therapeutic  importance  of  inhibiting  RhoC  activity  is 
clear,  highlighting  the  crucial  need  to  uncover  the  the  molecular  mechanisms  leading  to  RhoC- 
driven  metastatic  phenotype  of  IBC.  In  spite  of  this  need,  however,  a  model  explaining  the 
mechanisms  of  RhoC  overexpression  in  breast  cancer  does  not  exist.  The  goal  of  this  award  is  to 
establish  such  a  model.  Our  central  hypothesis  was  that  overexpression  of  RhoC  GTPase  in 
metastatic  breast  cancer  is  due  to  gene  amplification,  epigenetic  deregulation,  transcription 
factor  deregulation,  and/or  enhanced  or  differential  mRNA  stability.  Because  of  these  cellular 
and  molecular  alterations,  early  stage  IBC  is  subject  to  rapid  metastatic  spread  through 
downstream  effectors  signaling  for  invasion  and  angiogenesis. 

Body: 

As  I  matriculated  through  graduate  school,  I  originally  thought  that  I  was  going  to  work  full  time 
in  Dr.  Merajver’s  lab  (first  graduate  school  rotation)  studying  IBC.  However,  as  I  was  granted 
this  award  I  was  also  choosing  to  transfer  into  Dr.  Aral  Chinnaiyan’s  lab.  Fortunately,  I  was 
granted  permission  by  the  DOD  to  transfer  the  award  to  follow  me  to  continue  working  on  this 
project  from  Dr.  Chinnaiyan’s  lab.  Because  of  this,  we  have  been  able  to  establish  an  effective 
and  highly  collaborative  meeting  with  Dr.  Merajver  where  I  attend  her  bi-weekly  lab  meetings 
and  work  with  a  technician  in  her  lab  to  help  complete  this  project.  This  has  given  me  a  lot  of 
unique  experiences.  For  example,  learning  how  to  create  well  defined  experimental  protocols  and 
making  sure  that  the  technician  has  the  appropriate  materials  and  controls  to  execute  each 
experiment.  As  such,  continuing  this  DOD  pre-doctoral  grant  has  given  me  the  opportunity  to 
continue  existing  collaborations  and  to  continue  improving  my  leadership  skills  through  working 
with  a  technician  on  a  daily  basis. 

In  addition  to  working  with  a  technician,  I  have  also  had  the  opportunity  to  train  five 
undergraduate  students  through  the  University  of  Michigan  Undergraduate  Research 
Opportunities  Program  and  one  Master’s  degree  student.  The  students  have  learned  several 
different  protocols  including  PCR,  restriction  digests,  Gateway  cloning,  DNA  miniprep,  DNA 
maxiprep,  RNA  isolation,  cDNA  synthesis,  qRT-PCR,  PCR,  Western  blotting,  transfections  of 
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both  large  DNA  vectors  and  siRNA  into  mammalian  cells,  cell  culture,  production  of  lentivirus 
and  lentiviral  transduction,  cell  invasion  assays,  cell  growth  assays  and  propidium  iodide 
staining.  Additionally,  I  have  led  a  bi-weekly  cancer  biology  journal  club  meeting  with  all  of  the 
students  in  our  lab  (26  undergraduates).  At  the  end  of  each  semester,  I  help  the  students  compile 
their  results  to  present  at  a  lab  meeting  and  at  an  Undergraduate  research  forum  by  both  poster 
presentation  and  lecture.  Importantly,  three  of  the  students  that  I  have  trained  have  been  awarded 
NIH  summer  fellowships  that  funded  their  work  in  the  lab  for  the  entire  summer. 

In  addition  to  directly  working  with  undergraduate  students  in  the  lab,  as  part  of  the  statement  of 
work  the  training  plan  I  am  currently  a  graduate  assistant  teaching  Cancer  Biology  for  incoming 
graduate  students.  For  this  course,  I  co-ordinate  different  lectures  and  also  prepare  and  give 
several  lectures  throughout  the  semester.  For  example,  this  year  I  will  be  teaching  lectures  on 
GTPase  oncogenes  like  RhoC,  DNA  damage  and  translocations  as  well  as  the  use  of  high 
throughput  sequencing  for  modem  cancer  biology.  Additionally,  I  will  give  review  courses  for 
the  other  professors’  lectures  and  have  weekly  open  office  hours  for  the  students.  Finally,  as  part 
of  the  training  program  I  have  been  able  to  host  several  speakers  for  a  University  of  Michigan 
speaker  series  and  co-ordinate  student  discussions  with  the  lectures  by  providing  background 
reading  and  a  background  lecture  for  incoming  speakers. 

While  I  have  found  reward  in  the  successes  that  students  have  experienced  after  working  with 
them  in  the  various  teaching  formats,  I  have  also  been  able  to  leam  several  new  experimental 
techniques  that  I  would  not  otherwise  have  had  the  opportunity  to  leam  without  this  training 
grant  including  Solexa  high  throughput  Transcriptome  sequencing,  Fluorescence  in  situ 
hybridization  as  well  as  running  aCGH  and  microRNA  arrays.  Perhaps  more  interesting  is  the 
analysis  algorithms  that  I  am  helping  to  develop,  including  those  used  to  identify  novel  gene 
fusions  from  paired  end  sequencing  data  (8),  for  the  analysis  of  my  global  profiling  data  from 
these  IBC  cell  line  samples.  While  little  is  known  about  the  molecular  origins  of  inflammatory 
breast  cancer,  we  have  made  significant  advances  not  only  in  the  acquisition  of  large  profiling 
data  sets  of  DNA  copy  number,  microRNA  expression  and  transcriptome  sequencing,  but  also  in 
software  development  to  analyze  this  data.  Currently,  we  are  in  the  process  of  completing  an 
integrated  analysis  from  all  three  profiling  platforms.  Additionally,  we  have  unexpectedly  found 
that  the  two  IBC  cell  lines  SUM149  and  SUM190  have  an  extra  copy  of  chromosome  1.  Because 
several  other  stage  matched  breast  cancer  cell  lines  do  not  have  this  extra  copy  of  chromosome  1, 
we  are  exploring  the  occurrence  of  chromosome  1  amplification  in  IBC  clinical  samples.  The 
significance  of  this  finding  is  still  unclear,  but  will  be  explored  in  more  detail  if  a  clinical 
correlation  is  observed. 

The  opportunity  to  work  on  developing  novel  techniques  and  protocols  for  this  project  has  led 
directly  to  opportunities  to  improve  my  communication  and  professional  skills.  Within  the  last 
two  years,  I  have  presented  some  of  the  work  at  the  American  Association  for  Cancer  Research 
Meeting  in  Denver,  Colorado  (April  2009  and  April  2010).  At  those  meetings,  I  was  a  co-author 
or  first  author  on  six  posters  on  both  the  role  of  RhoC  GTPases  in  IBC  and  other  breast  cancers 
as  well  as  co-author  on  an  abstract  that  I  presented  by  podium  presentation.  Additionally,  this 
research  led  to  a  scholarship  to  attend  a  keystone  conference  in  Victoria,  British  Columbia.  For 
this  meeting,  I  wrote  a  meeting  summary  that  was  published  as  part  of  the  conference 
proceedings.  Following  the  research  for  this  project,  I  received  an  independent  nomination  to 
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become  an  American  Association  of  Cancer  Research  Associate  council  member.  Finally,  this 
research  has  led  directly  to  the  generation  of  preliminary  data  that  was  used  to  produce  a  grant, 
which  I  co-authored  and  is  funded  through  the  Susan  G.  Komen  foundation  N012788-00  (11- 
PAF00190). 

Key  Research  Accomplishments: 

Specific  Aim  1:  To  delineate  if  and  how  gene  amplification  in  RhoC  GTPase  occurs  in  breast 
cancer  and  to  identify  novel  gene  fusions  in  inflammatory  breast  cancer. 

•  Completed  RhoC  FISH  and  discovered  that  IBC  cell  lines  do  not  have  amplification  of 
the  RhoC  locus,  but  carry  an  extra  copy  of  chromosome  1.  (Figure  1) 

•  Acquired  244k  Agilent  aCGH  data  for  several  cell  lines  including  the  two  IBC  cell  lines, 
SUM  149  and  SUM  190.  Recurrent  aberrations  between  the  two  IBC  cell  lines  were  not 
observed. 

•  Completed  the  Illumina  bead  station  microRNA  profiling  chip  V2  of  cell  line  panel 
including  HME,  MCF10A,  SUM149,  SUM190,  HCC1937  and  BT20.  This  led  to  the 
identification  of  has-miR-31  and  the  anti-sense  hsa-miR-31*  as  downregulated  in  the  IBC 
cell  lines,  but  not  the  control  cell  lines  (Table  1).  This  observation  was  confirmed  using 
Taqman  qPCR  probes  to  analyze  mature  miR-31  and  miR-3 1  *  expression  across  the 
panel  of  cell  lines.  Because  miR-3 1  has  recently  been  shown  to  suppress  metastatic  breast 
cancer  (Valastyan  et.  al.,  2009),  we  are  currently  exploring  the  specific  role  of  miR-31  in 
IBC. 

•  Sequenced  the  RNA  transcriptome  of  both  SUM  149  and  SUM  190  using  massively 
parallel,  high  throughput  paired-end  sequencing  on  a  SOLEXA  GA2  from  Illumina. 

While  we  found  and  pursued  several  gene  fusions,  we  were  unable  to  identify  any 
recurrent  gene  fusions  using  our  integrated  techniques.  As  such  we  have  decided  to 
pursue  the  SOLEXA  data  in  more  detail  by  analyzing  the  role  of  non-coding  RNAs 
(ncRNAs)  in  both  IBC  and  highly  metastatic  breast  cancer.  To  do  this,  we  are  attempting 
to  identify  and  validate  ncRNAs  that  are  specifically  expressed  in  either  IBC,  triple 
negative  or  metastatic  breast  cancer.  Additionally,  we  have  generated  ChIP-Sequencing 
(ChIP-SEQ)  libraries  of  17-P-estradiol  treated  MCF7  and  BT474  cells  in  order  to  assess 
which  of  these  ncRNAs  may  be  estrogen  responsive.  To  demonstrate  the  success  of  these 
experiments,  Figure  2  shows  ChIP-Seq  coverage  maps  of  the  estrogen-regulated  gene 
GREB1  from  MCF7  and  BT474  cell  lines  starved  for  48  hours  and  then  treated  with 
either  InM  17-P-estradiol  or  vehicle  for  48  hours.  ChIP-assays  were  performed  with 
antibodies  against  ERa  or  a  histone  mark  of  activated  transcription,  H3K4-Me3. 

Specific  Aim2:  To  determine  how  DNA  methylation  status  and  histone  modifications  regulate 
the  RhoC  GTPase  promoter,  and  to  assess  the  ability  of  the  small  molecule  drugs  5-azacytidine 
and  Trichostatin  A  to  alter  the  metastatic  phenotype  depicted  by  an  IBC  cell  line  model. 

•  Completed  Illumina  bead  station  microRNA  profiling  chip  V2  of  cell  line  panel  including 
HME,  MCF10A,  SUM149,  SUM190,  MDA-MB-231,  HCC1937  and  BT20  treated  with 
5-azacytidine  or  Trichostatin  A. 

•  Prepared  RNA  transcriptome  libraries  of  both  SUM  149  and  SUM  190  treated  5- 
azacytidine  or  Trichostatin  A  for  sequencing  on  an  Illumina  SOLEXA  GA2. 
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•  Treatment  of  MCF10A  and  HME  cells  with  either  5-azacytidine  or  Trichostatin  A 
revealed  no  significant  increase  in  RhoC  mRNA  expression  suggesting  that  the  molecular 
mechanism  leading  to  RhoC  overexpression  does  not  involve  the  activation  of  genes 
repressed  by  either  methylation  or  deacetylation. 

Specific  Aim3:  To  characterize  the  consequences  of  down  regulating  the  expression  of  the 
transcription  factors  FoxP3,  HoxA3,  HoxB7,  HoxB8,  HoxD9,  HoxDIO,  CREB  and  NFkBI, 
all  of  which  contain  highly  conserved  binding  sites  in  the  putative  RhoC  GTPase  promoter,  on 
molecular  pathways  regulating  cell  proliferation,  survival  and  the  metastatic  phenotype,  using 
an  RNAi  model  system  of  human  IBC  cell  lines. 

•  Established  stable  shRNA  knockdown  cell  lines  for  FoxP3,  HoxA3,  HoxB7,  HoxB8, 
HoxD9,  HoxDIO,  CREB  and  NFkBI  in  SUM  149  cells. 

•  Identified  NFkBI  as  a  key  regulator  of  RhoC  mRNA  and  protein  expression  in  SUM  149 
and  SUM  190  cells.  (Figure  3) 

•  Completed  chromatin  immunoprecipitation  assays  that  demonstrated  enhanced  p65 
binding  at  2/3  putative  NFkBI  binding  sites  in  the  RhoC  promoter.  This  binding  pattern 
was  unique  to  SUM  149  cells.  (Figure  4 A  and  B) 

•  Established  a  4.0kbp  RhoC  promoter  reporter  system.  Importantly,  transient  transfections 
assays  with  this  promoter  reporter  system  demonstrated  increased  activity  in  the  SUM  149 
cells,  but  not  in  MCF10A  cells.  This  suggests  that  the  RhoC  promoter  activity  is 
deregulated  in  IBC  leading  to  RhoC  overexpression.  (Figure  4C) 

•  Developed  site  mutants  of  RhoC  promoter  reporter  system. 

•  Demonstrated  that  downregulation  of  p65  in  IBC  cells  leads  to  loss  of  cell  motility  and 
invasion.  (Figure  5) 

Specific  Aim4:  To  determine  the  distribution  and  stability  of  RhoC  GTPase  transcription 
variants  in  altering  the  half-life  of  the  different  mRNAs,  thereby,  regulating  the  total  RhoC 
GTPase  protein  expression. 

•  Established  RhoC  and  GAPDH  probes  for  northern  blot  analysis.  This  experiment 
demonstrated  that  RhoC  mRNA  decay  is  not  differential  between  IBC  and  non- IBC 
control  cell  lines. 

Reportable  outcomes: 

•  Published  a  manuscript  detailing  the  methodology  for  identification  of  gene  fusions  in 
epithelial  cancers,  “Chimeric  transcript  discovery  by  paired-end  transcriptome 
sequencing.”  (8) 

•  Published  a  review  titled,  “Translocations  in  epithelial  cancers.”  (9) 

•  A  manuscript  was  accepted  for  publication  at  Mol.  Cancer  Res,  “RhoC  Expression  and 
Head  and  Neck  Cancer  Metastasis”  (In  Press) 

•  Completed  a  book  chapter  that  was  accepted  for  publication,  “The  Rho  GTPases  in 
Cancer”  (In  Press) 

•  Manuscript  in  preparation,  “p65  drives  RhoC  GTPase  expression  and  the  metastatic 
phenotype  in  Inflammatory  Breast  Cancer” 

•  Research  has  led  to  an  additional  breast  cancer  grant  that  I  co-authored  through  the  Susan 
G.  Komen  foundation. 
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Conclusions: 

Since  the  submission  of  the  original  application  and  initiation  of  the  DOD  breast  cancer  training 
program,  I  have  completed  the  core  courses  in  Genetics,  Biochemistry,  Cell  Biology  and  Ethics 
required  by  the  University’s  CMB  program  as  well  as  comprehensive  courses  in  Cancer  Biology, 
Pharmacology,  Proteomics,  Bioinformatics  of  Sequence  Alignment  and  Mathematical  Models  in 
Biology.  I  have  completed  a  comprehensive  preliminary  exam  on  a  subject  unrelated  to  this 
DOD  award  (my  thesis  project)  as  required  by  the  CMB  program.  I  have  been  first  author  or  co¬ 
author  on  three  manuscripts  and  one  book  chapter  accepted  for  publication  on  work  directly 
disseminating  from  this  DOD  Breast  cancer  award.  Finally,  I  have  co-authored  a  grant  emanating 
from  this  research  which  was  funded  by  the  Susan  G.  Komen  foundation. 

Figures 

Figure  1.  Fluorescence  in  situ  hybridization  (FISH)  using  a  RHOC  locus  probe.  Normal 
breast  tissue  is  shown  on  the  left  as  well  as  the  IBC  cell  lines  SUM  149  and  SUM  190.  An 
interphase  spread  of  SUM190  is  shown.  In  both  SUM149  and  SUM190  cells,  three  copies  of 
chromosome  1  are  present  as  confirmed  by  additional  cytogenetic  analysis  using  a  centromeric 
probe  for  chromosome  1  leading  to  the  additional  copy  of  RHOC.  Representative  images  are 
shown. 

Figure  2.  ChIP-SEQ  positive  control  analysis.  Chromatin  immunoprecipitation-sequencing 
(ChIP-SEQ)  using  anti-ERa  or  anti-H3K4-tri-methylation  antibodies  on  MCF7  or  BT474  cells 
treated  with  or  without  InM  17-P-Estradiol  as  indicated.  Plots  show  read  accumulations  in  reads 
per  kilobase  million  were  aligned  to  the  genome  using  HPEAK  software  as  previously  described 
(Yu  et.  al.,  2010).  Analysis  of  the  GREB1  locus  reveals  increased  binding  of  ERa  and  H3K4-tri- 
methylation  in  both  cell  lines  following  stimulation  with  17-P-Estradiol. 

Figure  3.  p65  regulates  RhoC  mRNA  expression  in  SUM149  cells.  Following  the  targeted 
shRNA  screen,  p65  was  identified  as  a  potential  regulator  of  RhoC  mRNA  expression.  QPCR 
analysis  of  SUM  149  cells  treated  with  p65  siRNA  demonstrates  that  RhoC  mRNA  expression 
decreases  with  p65  knockdown.  p65  knockdown  was  confirmed  and  IL6,  a  known  target  of  p65, 
was  used  to  demonstrate  functional  p65  knockdown.  Importantly,  the  p65  siRNA  did  not  alter 
pi 05  mRNA  expression.  Reactions  were  run  in  quadruplicate  three  times.  Standard  deviation  is 
shown  in  the  error  bars. 

Figure  4.  p65  binds  to  the  RHOC  promoter.  A)  schematic  shows  putative  p65  binding  sites  in 
the  RHOC  proximal  promoter.  B)  ChIP  analysis  of  p65  binding  in  HME,  MCF10A,  SUM  149 
and  MDA-MB-231  cells  demonstrates  that  p65  is  enriched  in  the  SUM149  cell  line  at  p65 
binding  sites  1  and  3,  but  not  in  the  control  cell  lines.  C)  RHOC  promoter  reporter  activity 
demonstrates  that  the  RHOC  promoter,  but  not  an  empty  vector  control  is  highly  active  in  the 
SUM149  cell  line,  but  not  in  MCF10A  cells.  Data  is  shown  relative  to  a  renilla  control  used  to 
normalize  for  transfections  efficiency.  All  experiments  were  run  in  triplicate  and  standard 
deviation  is  shown  on  the  bar  plots. 

Figure  5.  p65  expression  is  required  for  SUM149  cell  motility  and  invasion.  A) 

Representative  photomicrographs  of  cell  motility  assays  in  SUM  149  cells  treated  with  shRNA  as 
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indicated.  B)  Quantification  of  cell  motility  assays.  C)  As  in  A,  except  boyden  chamber 
transwell  migration  assays.  Chambers  were  coated  with  100pL  matrigel  4  hours  prior  to  seeding 
cells  in  serum  free  media.  Forty  eight  hours  later,  representative  images  were  taken  to  assess 
invasion  through  8.0pM  pores.  Cells  were  stained  with  crystal  violet.  D)  Quantification  of  cell 
invasion.  Cells  were  released  from  the  membrane  with  acetic  acid  and  quantified  by  colorimetric 
analysis  at  560nM.  Percent  maximum  invasion  is  shown.  All  experiments  were  run  in  triplicate. 

Table  1.  Analysis  of  microRNA  array  data.  MicroRNAs  that  were  greater  than  two-fold  down- 
or  up-regulated  were  compared  across  cell  lines  to  identify  microRNAs  that  were  recurrently 
differential  among  IBC  cell  lines,  but  not  several  other  control  cell  lines. 
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Recurrent  gene  fusions  are  a  prevalent  class  of  mutations  arising  from 
the  juxtaposition  of  2  distinct  regions,  which  can  generate  novel 
functional  transcripts  that  could  serve  as  valuable  therapeutic  targets 
in  cancer.  Therefore,  we  aim  to  establish  a  sensitive,  high-throughput 
methodology  to  comprehensively  catalog  functional  gene  fusions  in 
cancer  by  evaluating  a  paired-end  transcriptome  sequencing  strategy. 
Not  only  did  a  paired-end  approach  provide  a  greater  dynamic  range 
in  comparison  with  single  read  based  approaches,  but  it  clearly 
distinguished  the  high-level  "driving"  gene  fusions,  such  as  BCR-ABL1 
and  TMPRSS2-ERG,  from  potential  lower  level  "passenger"  gene 
fusions.  Also,  the  comprehensiveness  of  a  paired-end  approach  en¬ 
abled  the  discovery  of  12  previously  undescribed  gene  fusions  in  4 
commonly  used  cell  lines  that  eluded  previous  approaches.  Using  the 
paired-end  transcriptome  sequencing  approach,  we  observed  read- 
through  mRNA  chimeras,  tissue-type  restricted  chimeras,  converging 
transcripts,  diverging  transcripts,  and  overlapping  mRNA  transcripts. 
Last,  we  successfully  used  paired-end  transcriptome  sequencing  to 
detect  previously  undescribed  ETS  gene  fusions  in  prostate  tumors. 
Together,  this  study  establishes  a  highly  specific  and  sensitive  ap¬ 
proach  for  accurately  and  comprehensively  cataloguing  chimeras 
within  a  sample  using  paired-end  transcriptome  sequencing. 

bioinformatics  |  gene  fusions  |  prostate  cancer  |  breast  cancer  |  RNA-Seq 

One  of  the  most  common  classes  of  genetic  alterations  is  gene 
fusions,  resulting  from  chromosomal  rearrangements  (1). 
Intriguingly,  >80%  of  all  known  gene  fusions  are  attributed  to 
leukemias,  lymphomas,  and  bone  and  soft  tissue  sarcomas  that 
account  for  only  10%  of  all  human  cancers.  In  contrast,  common 
epithelial  cancers,  which  account  for  80%  of  cancer-related  deaths, 
can  only  be  attributed  to  10%  of  known  recurrent  gene  fusions 
(2-4).  However,  the  recent  discovery  of  a  recurrent  gene  fusion, 
TMPRSS2-ERG ,  in  a  majority  of  prostate  cancers  (5,  6),  and 
EML4-ALK  in  non-small-cell  lung  cancer  (NSCLC)  (7),  has  ex¬ 
panded  the  realm  of  gene  fusions  as  an  oncogenic  mechanism  in 
common  solid  cancers.  Also,  the  restricted  expression  of  gene 
fusions  to  cancer  cells  makes  them  desirable  therapeutic  targets. 
One  successful  example  is  imatinib  mesylate,  or  Gleevec,  that 
targets  BCR-ABE1  in  chronic  myeloid  leukemia  (CML)  (8-10). 
Therefore,  the  identification  of  novel  gene  fusions  in  a  broad  range 
of  cancers  is  of  enormous  therapeutic  significance. 

The  lack  of  known  gene  fusions  in  epithelial  cancers  has  been 
attributed  to  their  clonal  heterogeneity  and  to  the  technical  limi¬ 
tations  of  cytogenetic  analysis,  spectral  karyotyping,  FISH,  and 
microarray-based  comparative  genomic  hybridization  (aCGH).  Not 
surprisingly,  TMPRSS2-ERG  was  discovered  by  circumventing 
these  limitations  through  bioinformatics  analysis  of  gene  expression 
data  to  nominate  genes  with  marked  overexpression,  or  outliers,  a 
signature  of  a  fusion  event  (6).  Building  on  this  success,  more  recent 
strategies  have  adopted  unbiased  high-throughput  approaches,  with 
increased  resolution,  for  genome-wide  detection  of  chromosomal 
rearrangements  in  cancer  involving  BAC  end  sequencing  (11), 
fosmid  paired-end  sequences  (12),  serial  analysis  of  gene  expression 


(SAGE)-like  sequencing  (13),  and  next-generation  DNA  sequenc¬ 
ing  (14).  Despite  unveiling  many  novel  genomic  rearrangements, 
solid  tumors  accumulate  multiple  nonspecific  aberrations  through¬ 
out  tumor  progression;  thus,  making  causal  and  driver  aberrations 
indistinguishable  from  secondary  and  insignificant  mutations, 
respectively. 

The  deep  unbiased  view  of  a  cancer  cell  enabled  by  massively 
parallel  transcriptome  sequencing  has  greatly  facilitated  gene  fu¬ 
sion  discovery.  As  shown  in  our  previous  work,  integrating  long  and 
short  read  transcriptome  sequencing  technologies  was  an  effective 
approach  for  enriching  “expressed”  fusion  transcripts  (15).  How¬ 
ever,  despite  the  success  of  this  methodology,  it  required  substantial 
overhead  to  leverage  2  sequencing  platforms.  Therefore,  in  this 
study,  we  adopted  a  single  platform  paired-end  strategy  to  com¬ 
prehensively  elucidate  novel  chimeric  events  in  cancer  transcrip- 
tomes.  Not  only  was  using  this  single  platform  more  economical,  but 
it  allowed  us  to  more  comprehensively  map  chimeric  mRNA,  hone 
in  on  driver  gene  fusion  products  due  to  its  quantitative  nature,  and 
observe  rare  classes  of  transcripts  that  were  overlapping,  diverging, 
or  converging. 

Results 

Chimera  Discovery  via  Paired-End  Transcriptome  Sequencing.  Here, 
we  employ  transcriptome  sequencing  to  restrict  chimera  nomina¬ 
tions  to  “expressed  sequences,”  thus,  enriching  for  potentially 
functional  mutations.  To  evaluate  massively  parallel  paired-end 
transcriptome  sequencing  to  identify  novel  gene  fusions,  we  gen¬ 
erated  cDNA  libraries  from  the  prostate  cancer  cell  line  VCaP, 
CML  cell  line  K562,  universal  human  reference  total  RNA  (UHR; 
Stratagene),  and  human  brain  reference  (HBR)  total  RNA  (Am- 
bion).  Using  the  Illumina  Genome  Analyzer  II,  we  generated  16.9 
million  VCaP,  20.7  million  K562,  25.5  million  UHR,  and  23.6 
million  HBR  transcriptome  mate  pairs  (2  X  50  nt).  The  mate  pairs 
were  mapped  against  the  transcriptome  and  categorized  as  (/) 
mapping  to  same  gene,  ( ii )  mapping  to  different  genes  (chimera 
candidates),  (iii)  nonmapping,  (iv)  mitochondrial,  (v)  quality  con¬ 
trol,  or  (vi)  ribosomal  (Table  SI).  Overall,  the  chimera  candidates 
represent  a  minor  fraction  of  the  mate  pairs,  comprising  «*<1%  of 
the  reads  for  each  sample. 

We  believe  that  a  paired-end  strategy  offers  multiple  advantages 
over  single  read  based  approaches  such  as  alleviating  the  reliance 
on  sequencing  the  reads  traversing  the  fusion  junction,  increased 
coverage  provided  by  sequencing  reads  from  the  ends  of  a  tran- 
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scribed  fragment,  and  the  ability  to  resolve  ambiguous  mappings 
(Fig.  SI).  Therefore,  to  nominate  chimeras,  we  leveraged  each  of 
these  aspects  in  our  bioinformatics  analysis.  We  focused  on  both 
mate  pairs  encompassing  and/or  spanning  the  fusion  junction  by 
analyzing  2  main  categories  of  sequence  reads:  chimera  candidates 
and  nonmapping  (Fig.  S24).  The  resulting  chimera  candidates  from 
the  nonmapping  category  that  span  the  fusion  boundary  were 
merged  with  the  chimeras  found  to  encompass  the  fusion  boundary 
revealing  119, 144, 205,  and  294  chimeras  in  VCaP,  K562,  HBR,  and 
UHR,  respectively. 

Comparison  of  a  Paired-End  Strategy  Against  Existing  Single  Read 
Approaches.  To  assess  the  merit  of  adopting  a  paired-end  transcrip- 
tome  approach,  we  compared  the  results  against  existing  single  read 
approaches.  Although  current  RNA  sequencing  (RNA-Seq)  stud¬ 
ies  have  been  using  36-nt  single  reads  (16,  17),  we  increased  the 
likelihood  of  spanning  a  fusion  junction  by  generating  100-nt  long 
single  reads  using  the  Illumina  Genome  Analyzer  II.  Also,  we  chose 
this  length  because  it  would  facilitate  a  more  comparable  amount 
of  sequencing  time  as  required  for  sequencing  both  50-nt  mate 
pairs.  In  total,  we  generated  7.0,  59.4,  and  53.0  million  100-nt 
transcriptome  reads  for  VCaP,  UHR,  and  HBR,  respectively,  for 
comparison  against  paired-end  transcriptome  reads  from  matched 
samples. 

Because  the  UHR  is  a  mixture  of  cancer  cell  lines,  we  expected 
to  find  numerous  previously  identified  gene  fusions.  Therefore,  we 
first  assessed  the  depth  of  coverage  of  a  paired-end  approach 
against  long  single  reads  by  directly  comparing  the  normalized 
frequency  of  sequence  reads  supporting  4  previously  identified  gene 
fusions  [TMPRSS2-ERG  (5,  6),  BCR-ABL1  (18),  BCAS4-BCAS3 
(19),  and  ARF GEF2-S ULF2  (20)].  As  shown  in  Fig.  L4,  we  ob¬ 
served  a  marked  enrichment  of  paired-end  reads  compared  with 
long  single  reads  for  each  of  these  well  characterized  gene  fusions. 

We  observed  that  TMPRSS2-ERG  had  a  >  10-fold  enrichment 
between  paired-end  and  single  read  approaches.  The  schematic 
representation  in  Fig.  IB  indicates  the  distribution  of  reads  con¬ 
firming  the  TMPRSS2-ERG  gene  fusion  from  both  paired-end  and 
single  read  sequencing.  As  expected,  the  longer  reads  improve  the 
number  of  reads  spanning  known  gene  fusions.  For  example,  had 
we  sequenced  a  single  36-mer  (shown  in  red  text),  11  of  the  17 
chimeras,  shown  in  the  bottom  portion  of  the  long  single  reads, 
would  not  have  spanned  the  gene  fusion  boundary,  but  instead, 
would  have  terminated  before  the  junction  and,  therefore,  only 
aligned  to  TMPRSS2.  However,  despite  the  improved  results  only 
17  chimeric  reads  were  generated  from  7.0  million  long  single  read 
sequences.  In  contrast,  paired-end  sequencing  resulted  in  552  reads 
supporting  the  TMPRSS2-ERG  gene  fusion  from  «*17  million 
sequences. 

Because  we  are  using  sequence  based  evidence  to  nominate  a 
chimera,  we  hypothesized  that  the  approach  providing  the  maxi¬ 
mum  nucleotide  coverage  is  more  likely  to  capture  a  fusion  junc¬ 
tion.  We  calculated  an  in  silico  insert  size  for  each  sample  using 
mate  pairs  aligning  to  the  same  gene,  and  found  the  mean  insert  size 
of  ^200  nt.  Then,  we  compared  the  total  coverage  from  single  reads 
(coverage  is  equivalent  to  the  total  number  of  pass  filter  reads 
against  the  read  length)  with  the  paired-end  approach  (coverage  is 
equivalent  to  the  sum  of  the  insert  size  with  the  length  of  each  read) 
(Fig.  S2 B).  Overall,  we  observed  an  average  coverage  of  848.7  and 
757.3  MB  using  single  read  technology,  compared  with  2,553.3  and 
2,363  MB  from  paired-end  in  UHR  and  HBR,  respectively.  This 
increase  in  ^3-fold  coverage  in  the  paired-end  samples  compared 
with  the  long  read  approach,  per  lane,  could  explain  the  increased 
dynamic  range  we  observed  using  a  paired-end  strategy. 

Next  we  wanted  to  identify  chimeras  common  to  both  strategies. 
The  long  read  approach  nominated  1,375  and  1,228  chimeras, 
whereas  with  a  paired-end  strategy,  we  only  nominated  225  and  144 
chimeras  in  UHR  and  HBR,  respectively.  As  shown  in  the  Venn 
diagram  (Fig.  1C),  there  were  32  and  31  candidates  common  to  both 


technologies  for  UHR  and  HBR,  respectively.  Within  the  common 
UHR  chimeric  candidates,  we  observed  previously  identified  gene 
fusions  BCAS4-BCAS3 ,  BCR-ABL1 ,  ARF  GEE 2- SELF 2,  and 
RPS6KB1-TMEM49  (13).  The  remaining  chimeras,  nominated  by 
both  approaches,  represent  a  high  fidelity  set.  Therefore,  to  further 
assess  whether  a  paired-end  strategy  has  an  increased  dynamic 
range,  we  compared  the  ratio  of  normalized  mate  pair  reads  against 
single  reads  for  the  remaining  chimeras  common  to  both  technol¬ 
ogies.  We  observed  that  93.5  and  93.9%  of  UHR  and  HBR 
candidates,  respectively,  had  a  higher  ratio  of  normalized  mate  pair 
reads  to  single  reads  (Table  S2),  confirming  the  increased  dynamic 
range  offered  by  a  paired-end  strategy.  We  hypothesize  that  the 
greater  number  of  nominated  candidates  specific  to  the  long  read 
approach  represents  an  enrichment  of  false  positives,  as  observed 
when  using  the  454  long  read  technology  (15,  21). 

Paired-End  Approach  Reveals  Novel  Gene  Fusions.  We  were  inter¬ 
ested  in  determining  whether  the  paired-end  libraries  could  detect 
novel  gene  fusions.  Among  the  top  chimeras  nominated  from 
VCaP,  HBR,  UHR,  and  K562,  many  were  already  known,  including 
TMPRSS2-ERG,  BCAS4-BCAS3 ,  BCR-ABL1 ,  USP10-ZDHHC7 , 
and  ARFGEF2-SULF2.  Also  ranking  among  these  well  known  gene 
fusions  in  UHR  was  a  fusion  on  chromosome  13  between  GAS6  and 
RASA3  (Fig.  S3 A  and  Table  S2).  The  fact  that  GAS6-RASA3 
ranked  higher  than  BCR-ABL1  suggests  that  it  may  be  a  driving 
fusion  in  one  of  the  cancer  cell  lines  in  the  RNA  pool. 

Another  observation  was  that  there  were  2  candidates  among  the 
top  10  found  in  both  UHR  and  K562.  This  observation  was 
intriguing,  because  hematological  malignancies  are  not  considered 
to  have  multiple  gene  fusion  events.  In  addition  to  BCR-ABL1 ,  we 
were  able  to  detect  a  previously  undescribed  interchromosomal 
gene  fusion  between  exon  23  of  NUP214  located  at  chromosome 
9q34.13  with  exon  2  of XKR3  located  at  chromosome  22qll.l.  Both 
of  these  genes  reside  on  chromosome  22  and  9  in  close  proximity 
to  BCR  and  ABL1,  respectively  (Fig.  S3 B).  We  confirmed  the 
presence  of  NUP214-XKR3  in  K562  cells  using  qRT-PCR,  but  were 
unable  to  detect  it  across  an  additional  5  CML  cell  lines  tested 
(SUP-B15,  MEG-01,  KU812,  GDM-1,  and  Kasumi-4)  (Fig.  S3C). 
These  results  suggest  that  NUP214-XKR3  is  a  “private”  fusion  that 
originated  from  additional  complex  rearrangements  after  the  trans¬ 
location  that  generated  BCR-ABL1  and  a  focal  amplification  of 
both  gene  regions. 

Although  we  were  able  to  detect  BCR-ABE1  and  NUP214- 
XKR3  in  both  UHR  and  K562,  there  was  a  marked  reduction  in 
the  mate  pairs  supporting  these  fusions  in  UHR.  Although  a 
diluted  signal  is  expected,  because  UHR  is  pooled  samples,  it 
provides  evidence  that  pooling  samples  can  serve  as  a  useful 
approach  for  nominating  top  expressing  chimeras,  and  poten¬ 
tially  enrich  for  “driver”  chimeras. 

Previously  Undescribed  Prostate  Gene  Fusions.  Our  previous  work 
using  integrative  transcriptome  sequencing  to  detect  gene  fusions  in 
cancer  revealed  multiple  gene  fusions,  demonstrating  the  complex¬ 
ity  of  the  prostate  transcriptomes  of  VCaP  and  LNCaP  (15).  Here, 
we  exploit  the  comprehensiveness  of  a  paired-end  strategy  on  the 
same  cell  lines  to  reveal  novel  chimeras.  In  the  circular  plot  shown 
in  Fig.  S44,  we  displayed  all  experimentally  validated  paired-end 
chimeras  in  the  larger  red  circle.  We  found  that  all  of  the  previously 
discovered  chimeras  in  VCaP  and  LNCaP  comprised  a  subset  of  the 
paired-end  candidates,  as  displayed  in  the  inner  black  circle. 

As  expected,  TMPRSS2-ERG  was  the  top  VCaP  candidate.  In 
addition  to  “rediscovering”  the  USP1 0-ZDHHC7,  HJURP-INPP4A , 
and  EIF4E2-HJURP  gene  fusions,  a  paired-end  approach  revealed 
several  previously  undescribed  gene  fusions  in  VCaP.  One  such 
example  was  an  interchromosomal  gene  fusion  between  ZDHHC7 , 
on  chromosome  16,  with  ABCB9,  residing  on  chromosome  12,  that 
was  validated  by  qRT-PCR  (Fig.  S3 D).  Interestingly,  the  5'  partner, 
ZDHHC7 ,  had  previously  been  validated  as  a  complex  intrachro- 


12354  |  www.pnas.org/cgi/doi/10.1073/pnas.0904720106 


Maher  et  al. 


A 


Paired-end 


Single  Read 


E3 


Di 

Chr  21q22.2-21q22.3 


Fig.  1 .  Dynamic  range  and  sensitivity  of  the  paired-end  transcriptome  analysis  relative  to  single  read  approaches.  (/A)  Comparison  of  paired-end  (blue)  and  long  single 
transcriptome  reads  (black)  supporting  known  gene  fusions  TMPRSS2-ERG,  BCR-ABL1,  BCAS4-BCAS3,  and  ARFGEF2-SULF2.  (B)  Schematic  representation  of  TMPRSS2- 
ERG  in  VCaP,  comparing  mate  pairs  with  long  single  transcriptome  reads.  (Upper)  Frequency  of  mate  pairs,  shown  in  log  scale,  are  divided  based  on  whether  they 
encompass  or  span  the  fusion  boundary;  (Lower)  1 00-mer  single  transcriptome  reads  spanning  TMPRSS2-ERG  fusion  boundary.  First  36  nt  are  highlighted  in  red.  (O 
Venn  diagram  of  chimera  nominations  from  both  a  paired-end  (orange)  and  long  single  read  (blue)  strategy  for  UHR  and  HBR. 


mosomal  gene  fusion  with  USP10  (15).  Both  fusions  have  mate 
pairs  aligning  to  the  same  exon  of  ZDHHC7  (15),  suggesting  that 
their  breakpoints  are  in  adjacent  introns  (Fig.  S3 D). 

Another  previously  undescribed  VCaP  interchromosomal  gene 
fusion  that  we  discovered  was  between  exon  2  of  7X47,  residing  on 
chromosome  2,  with  exon  3  of  DIRC2 ,  or  disrupted  in  renal 
carcinoma  2,  located  on  chromosome  3.  TIA1-DIRC2  was  validated 
by  qRT-PCR  and  FISH  (Fig.  S5).  In  total,  we  confirmed  an 


additional  4  VCaP  and  2  LNCaP  chimeras  (Fig.  S6).  Overall,  these 
fusions  demonstrate  that  paired-end  transcriptome  sequencing  can 
nominate  candidates  that  have  eluded  previous  techniques,  includ¬ 
ing  other  massively  parallel  transcriptome  sequencing  approaches. 

Distinguishing  Causal  Gene  Fusions  from  Secondary  Mutations.  We 

were  next  interested  in  determining  whether  the  dynamic  range 
provided  by  paired-end  sequencing  can  distinguish  known  high- 
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Fig.  2.  RNA  based  chimeras.  (A)  Heatmaps  showing  the  normalized  number  of  reads  supporting  each  read-through  chimera  across  samples  ranging  from  0  (white) 
to  30  (red).  (Upper)  The  heatmap  highlights  broadly  expressed  chimeras  in  UHR,  HBR,  VCaP,  and  K562.  (Lower)  The  heatmap  highlights  the  expression  of  the  top 
ranking  restricted  gene  fusions  that  are  enriched  with  interchromosomal  and  intrachromosomal  rearrangements.  (B)  Illustrative  examples  classifying  RNA-based 
chimeras  into  (i)  read-throughs,  (if)  converging  transcripts,  (Hi)  diverging  transcripts,  and  (/V)  overlapping  transcripts.  (C  Upper)  Paired-end  approach  links  reads  from 
independent  genes  as  belonging  to  the  same  transcriptional  unit  (Right),  whereas  a  single  read  approach  would  assign  these  reads  to  independent  genes  (Left). 
(Lower)  The  single  read  approach  requires  that  a  chimera  span  the  fusion  junction  (Left),  whereas  a  paired-end  approach  can  link  mate  pairs  independent  of  gene 
annotation  (Right). 


level  “driving”  gene  fusions,  such  as  known  recurrent  gene  fusions 
BCR-ABL1  and  TMPRSS2-ERG ,  from  lower  level  “passenger” 
fusions.  Therefore,  we  plotted  the  normalized  mate  pair  coverage 
at  the  fusion  boundary  for  all  experimentally  validated  gene  fusions 
for  the  2  cell  lines  that  we  sequenced  harboring  recurrent  gene 
fusions,  VCaP  and  K562.  As  shown  in  Fig.  SAB,  we  observed  that 
both  driver  fusions,  TMPRSS2-ERG  and  BCR-ABL1,  show  the 
highest  expression  among  the  validated  chimeras  in  VCaP  and 
K562,  respectively.  This  observation  suggests  a  paired-end  nomi¬ 
nation  strategy  for  selecting  putative  driver  gene  fusions  among 
private  nonspecific  gene  fusions  that  lack  detectable  levels  of 
expression  across  a  panel  of  samples  (15). 

Previously  Undescribed  Breast  Cancer  Gene  Fusions.  Our  ability  to 
detect  previously  undescribed  prostate  gene  fusions  in  VCaP  and 
LNCaP  demonstrated  the  comprehensiveness  of  paired-end  tran- 
scriptome  sequencing  compared  with  an  integrated  approach,  using 
short  and  long  transcriptome  reads.  Therefore,  we  extended  our 
paired-end  analysis  by  using  breast  cancer  cell  line  MCF-7,  which 
has  been  mined  for  fusions  using  numerous  approaches  such  as 
expressed  sequence  tags  (ESTs)  (22),  array  CGH  (23),  single 
nucleotide  polymorphism  arrays  (24),  gene  expression  arrays  (25), 
end  sequence  profiling  (20,  26),  and  paired-end  diTag  (PET)  (13). 

A  histogram  (Fig.  S4C)  of  the  top  ranking  MCF-7  candidates 
highlights  BCAS4-BCAS3  and  ARFGEF-SULF2  as  the  top  2  rank¬ 
ing  candidates,  whereas  other  previously  reported  candidates,  such 
as  S ULF2-PRICKLE,  DEPDC1B-ELOVL7,  RPS6KB1  - TMEM49, 
and  CXorfl5-SYAPl ,  were  interspersed  among  a  comprehensive  list 
of  previously  undescribed  putative  chimeras.  To  confirm  that  these 
previously  undescribed  nominations  were  not  false  positives,  we 
experimentally  validated  2  interchromosomal  and  3  intrachromo¬ 
somal  candidates  using  qRT-PCR  (Fig.  S6).  Overall,  not  only  was 


a  paired-end  approach  able  to  detect  gene  fusions  that  have  eluded 
numerous  existing  technologies,  it  has  revealed  5  previously  unde¬ 
scribed  mutations  in  breast  cancer. 

RNA-Based  Chimeras.  Although  many  of  the  inter  and  intrachromo¬ 
somal  rearrangements  that  we  nominated  were  found  within  a 
single  sample,  we  observed  many  chimeric  events  shared  across 
samples.  We  identified  11  chimeric  events  common  to  UHR,  VCaP, 
K562,  and  HBR  (Table  S3).  Via  heatmap  representation  (Fig.  24) 
of  the  normalized  frequency  of  mate  pairs  supporting  each  chimeric 
event,  we  can  observe  these  events  are  broadly  transcribed  in 
contrast  to  the  top  restricted  chimeric  events.  Also,  we  found  that 
100%  of  the  broadly  expressed  chimeras  resided  adjacent  to  one 
another  on  the  genome,  whereas  only  7.7%  of  the  restricted 
candidates  were  neighboring  genes.  This  discrepancy  can  be  ex¬ 
plained  by  the  enrichment  of  inter  and  intrachromosomal  rear¬ 
rangements  in  the  restricted  set. 

Unlike,  previously  characterized  restricted  read-throughs,  such 
as  SLC45A3-ELK4  (15),  which  are  found  adjacent  to  one  another, 
but  in  the  same  orientation,  we  found  that  the  majority  of  the 
broadly  expressed  chimera  candidates  resided  adjacent  to  one 
another  in  different  orientations.  Therefore,  we  have  categorized 
these  events  as  (j)  read-throughs,  adjacent  genes  in  the  same 
orientation,  (if)  diverging  genes,  adjacent  genes  in  opposite  orien¬ 
tation  whose  5'  ends  are  in  close  proximity,  (Hi)  convergent  genes, 
adjacent  genes  in  opposite  orientation  whose  3'  ends  are  in  close 
proximity,  and  (iv)  overlapping  genes,  adjacent  genes  who  share 
common  exons  (Fig.  2 B).  Based  on  this  classification,  we  found  1 
read-through,  2  convergent  genes,  6  divergent  genes,  and  2  over¬ 
lapping  genes.  Also,  we  found  that  ^81.8%  of  these  chimeras  had 
at  least  1  supporting  EST,  providing  independent  confirmation  of 
the  event  (Table  S3).  In  contrast  to  paired-end,  single  read  ap- 
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proaches  would  likely  miss  these  instances  as  each  mate  would  have 
aligned  to  their  respective  genes  based  on  the  current  annotations 
(Fig.  2 C).  Also,  these  instances  may  represent  extensions  of  a 
transcriptional  unit,  which  would  not  be  detectable  by  a  single  read 
approach  that  identifies  chimeric  reads  that  span  exon  boundaries 
of  independent  genes.  Overall,  we  believe  that  many  of  these 
broadly  expressed  RNA  chimeras  represent  instances  where  mate 
pairs  are  revealing  previously  undescribed  annotation  for  a  tran¬ 
scriptional  unit. 


>  1  I _ [= 

Chromosome  1 6q1 3  Chromosome  21  q21 .1 


Previously  Undescribed  ETS  Gene  Fusions  in  Clinically  Localized  Pros¬ 
tate  Cancer.  Given  the  high  prevalence  of  gene  fusions  involving 
ETS  oncogenic  transcription  factor  family  members  in  prostate 
tumors,  we  applied  paired-end  transcriptome  sequencing  for  gene 
fusion  discovery  in  prostate  tumors  lacking  previously  reported 
ETS  fusions.  For  2  prostate  tumors,  aT52  and  aT64,  we  generated 
6.2  and  7.4  million  transcriptome  mate  pairs,  respectively.  In  aT64, 
we  found  that  HERPUD1,  residing  on  chromosome  16,  juxtaposed 
in  front  of  exon  4  of  ERG  (Fig.  3A),  which  was  validated  by 
qRT-PCR  (Fig.  S6)  and  FISH  (Fig.  3 B),  thus  identifying  a  third  5' 
fusion  partner  for  ERG,  after  TMPRSS2  (6)  and  SLC45A3  (27),  and 
presumably,  HERPUD1  also  mediates  the  overexpression  of  ERG 
in  a  subset  of  prostate  cancer  patients.  Also,  just  as  TMPRSS2  and 
SLC45A3  have  been  shown  to  be  androgen  regulated  by  qRT-PCR 
(5),  we  found  HERPUD1  expression,  via  RNA-Seq,  to  be  respon¬ 
sive  to  androgen  treatment  (Fig.  S7).  Also,  ChIP-Seq  analysis 
revealed  androgen  binding  at  the  5'  end  of  HERPUD1  (Fig.  S7). 

Also,  in  the  second  prostate  tumor  sample  (aT52),  we  discovered 
an  interchromosomal  gene  fusion  between  the  5'  end  of  a  prostate 
cDNA  clone,  AX747630  ( FLJ35294 ),  residing  on  chromosome  17, 
with  exon  4  of  ETV1,  located  on  chromosome  7  (Fig.  3C),  which  was 
validated  via  qRT-PCR  (Fig.  S6)  and  FISH  (Fig.  3D).  Interestingly, 
this  fusion  has  previously  been  reported  in  an  independent  sample 
found  by  a  fluorescence  in  situ  hybridization  screen  (27);  thus, 
demonstrating  that  it  is  recurrent  in  a  subset  of  prostate  cancer 
patients.  As  previously  reported,  gene  expression  via  RNA-Seq 
confirmed  that  AX747630  is  an  androgen-inducible  gene  (Fig.  S7). 
Also,  ChIP-Seq  revealed  androgen  occupancy  at  the  5'  end  of 
AX747630  (Fig.  S7). 

Discussion 

This  study  demonstrates  the  effectiveness  of  paired-end  massively 
parallel  transcriptome  sequencing  for  fusion  gene  discovery.  By 
using  a  paired-end  approach,  we  were  able  to  rediscover  known 
gene  fusions,  comprehensively  discover  previously  undescribed 
gene  fusions,  and  hone  in  on  causal  gene  fusions.  The  ability  to 
detect  12  previously  undescribed  gene  fusions  in  4  commonly  used 
cell  lines  that  eluded  any  previous  efforts  conveys  the  superior 
sensitivity  of  a  paired-end  RNA-Seq  strategy  compared  with  ex¬ 
isting  approaches.  Also,  it  suggests  that  we  may  be  able  to  unveil 
previously  undescribed  chimeric  events  in  previously  characterized 
samples  believed  to  be  devoid  of  any  known  driver  gene  fusions  as 
exemplified  by  the  discovery  of  previously  undescribed  ETS  gene 
fusions  in  2  clinically  localized  prostate  tumor  samples  that  lacked 
known  driver  gene  fusions. 

By  analyzing  the  transcriptome  at  unprecedented  depth,  we  have 
revealed  numerous  gene  fusions,  demonstrating  the  prevalence  of 
a  relatively  under-represented  class  of  mutations.  However,  one  of 
the  major  goals  remains  to  discover  recurrent  gene  fusions  and  to 
distinguish  them  from  secondary,  nonspecific  chimeras.  Although 
quantifying  expression  levels  is  not  proof  of  whether  a  gene  fusion 
is  a  driver  or  passenger,  because  a  low-level  gene  fusion  could  still 
be  causative,  it  still  of  major  significance  that  a  paired-end  strategy 
clearly  distinguished  known  high-level  driving  gene  fusions,  such  as 
BCR-ABL1  and  TMPRSS2-ERG,  from  potential  lower  level  pas¬ 
senger  chimeras.  Overall,  these  fusions  serve  as  a  model  for 
employing  a  paired-end  nomination  strategy  for  prioritizing  leads 


HERPUD1/ERG  Normal  ERG  5’  deletion  HERPUD1/ERG  FUSION 


Fig.  3.  Discovery  of  previously  undescribed  ETS  gene  fusions  in  localized 
prostate  cancer.  04)  Schematic  representation  of  the  interchromosomal  gene 
fusion  between  exon  1  ofHERPUDl  (red),  residing  on  chromosome  16,  with  exon 
4  of  ERG  (blue),  located  on  chromosome  21 .  ( B )  Schematic  representation  show¬ 
ing  genomic  organization  of  HERPUD1  and  ERG  genes.  Horizontal  red  and  green 
bars  indicate  the  location  of  BAC  clones.  (Lower)  FISH  analysis  using  BAC  clones 
showing  HERPUD1  and  ERG  in  a  normal  tissue  (Left),  deletion  of  the  ERG  5'  region 
in  tumor  (Center),  and  HERPUD1-ERG  fusion  in  a  tumor  sample  (Right).  (O 
Schematic  representation  of  the  interchromosomal  gene  fusion  between 
FLJ35294  (green),  residing  on  chromosome  17,  with  exon  4  of  ETV1  (orange) 
located  on  chromosome  21.  (D  Upper)  Schematic  representation  of  the  genomic 
organization  of  FLJ35294  and  ETV1  genes.  (Lower)  FISH  analysis  using  BAC  clones 
showing  split  of  ETV1  in  tumor  sample  (Left)  and  the  colocalization  of  FLJ35294 
and  ETV1  in  a  tumor  sample  (Right). 


likely  to  be  high-level  driving  gene  fusions,  which  would  subse¬ 
quently  undergo  further  functional  and  experimental  evaluation. 
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One  of  the  major  advantages  of  using  a  transcriptome  approach 
is  that  it  enables  us  to  identify  rearrangements  that  are  not 
detectable  at  the  DNA  level.  For  example,  conventional  cytogenetic 
methods  would  miss  gene  fusions  produced  by  paracentric  inver¬ 
sions,  or  sub  microscopic  events,  such  as  GAS6-RASA3.  Also, 
transcriptome  sequencing  can  unveil  RNA  chimeras,  lacking  DNA 
aberrations,  as  demonstrated  by  the  discovery  of  a  recurrent, 
prostate  specific,  read-through  of  SLC45A3  with  ELK4  in  prostate 
cancers.  Further  classification  of  RNA  based  events  using  paired- 
end  sequencing  revealed  numerous  broadly  expressed  chimeras 
between  adjacent  genes.  Although  these  events  were  not  necessarily 
read-throughs  events,  because  they  typically  had  different  orienta¬ 
tions,  we  believe  they  represent  extensions  of  transcriptional  units 
beyond  their  annotated  boundaries.  Unlike  single  read  based 
approaches,  which  require  chimeras  to  span  exon  boundaries  of 
independent  genes,  we  were  able  to  detect  these  events  using 
paired-end  sequencing,  which  could  have  significant  impact  for 
improving  how  we  annotate  transcriptional  units. 

Overall,  we  have  demonstrated  the  advantages  of  employing  a 
paired-end  transcriptome  strategy  for  chimera  discovery,  estab¬ 
lished  a  methodology  for  mining  chimeras,  and  extensively  cata¬ 
logued  chimeras  in  a  prostate  and  hematological  cancer  models.  We 
believe  that  the  sensitivity  of  this  approach  will  be  of  broad  impact 
and  significance  for  revealing  novel  causative  gene  fusions  in 
various  cancers  while  revealing  additional  private  gene  fusions  that 
may  contribute  to  tumorigenesis  or  cooperate  with  driver  gene 
fusions. 

Methods 

Paired-End  Gene  Fusion  Discovery  Pipeline.  Mate  pair  transcriptome  reads  were 
mapped  to  the  human  genome  (hg18)  and  Refseq  transcripts,  allowing  up  to  2 
mismatches,  using  Efficient  Alignment  of  Nucleotide  Databases  (ELAND)  pair 
within  the  lllumina  Genome  Analyzer  Pipeline  software.  Illumina  export  output 
files  were  parsed  to  categorize  passing  filter  mate  pairs  as  (/)  mapping  to  the  same 
transcript,  (if)  ribosomal,  (///)  mitochondrial,  (/V)  quality  control,  (v)  chimera  can¬ 
didates,  and  (vi)  nonmapping.  Chimera  candidates  and  nonmapping  categories 
were  used  for  gene  fusion  discovery.  For  the  chimera  candidates  category,  the 
following  criteria  were  used:  (i)  mate  pairs  must  be  of  high  mapping  quality  (best 
unique  match  across  genome),  (if)  best  unique  mate  pairs  do  not  have  a  more 
logical  alternative  combination  (i.e.,  best  mate  pairs  suggest  an  interchromo- 
somal  rearrangement,  whereas  the  second  best  mapping  for  a  mate  reveals  the 
pair  have  a  alignment  within  the  expected  insert  size),  (Hi)  the  sum  of  the 
distances  between  the  most  5'  and  3'  mate  on  both  partners  of  the  gene  fusion 
must  be  <500  nt,  and  (/V)  mate  pairs  supporting  a  chimera  must  be  nonredun- 
dant. 
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In  addition  to  mining  mate  pairs  encompassing  a  fusion  boundary,  the  non¬ 
mapping  category  was  mined  for  mate  pairs  that  had  1  read  mapping  to  a  gene, 
whereas  its  corresponding  read  fails  to  align,  because  it  spans  the  fusion  bound¬ 
ary.  First,  the  annotated  transcript  thatthe  "mapping"  mate  pair  aligned  against 
was  extracted,  because  this  transcript  represents  one  of  the  potential  partners 
involved  in  the  gene  fusion.  The  "nonmapping"  mate  pair  was  then  aligned 
against  all  of  the  exon  boundaries  of  the  known  gene  partner  to  identify  a  perfect 
partial  alignment.  A  partial  alignment  confirms  that  the  nonmapping  mate  pair 
maps  to  our  expected  gene  partner  while  revealing  the  portion  of  the  nonmap¬ 
ping  mate  pair,  or  overhang,  aligning  to  the  unknown  partner.  The  overhang  is 
then  aligned  against  the  exon  boundaries  of  all  known  transcripts  to  identify  the 
fusion  partner.  This  process  is  done  using  a  Perl  script  that  extracts  all  possible 
University  of  California  Santa  Cruz  (UCSC)  and  Refseq  exon  boundaries  looking 
for  a  single  perfect  best  hit. 

Mate  pairs  spanning  the  fusion  boundary  are  merged  with  mate  pairs  encom¬ 
passing  the  fusion  boundary.  At  least  2  independent  mate  pairs  are  required  to 
support  a  chimera  nomination,  which  can  be  achieved  by  (/)  2  or  more  nonre- 
dundant  mate  pairs  spanning  the  fusion  boundary,  (if)  2  or  more  nonredundant 
mate  pairs  encompassing  a  fusion  boundary,  or  (/##)  1  or  more  mate  pairs  encom¬ 
passing  a  fusion  boundary  and  1  or  more  mate  pairs  spanning  the  fusion  bound¬ 
ary.  All  chimera  nominations  were  normalized  based  on  the  cumulative  number 
of  mate  pairs  encompassing  or  spanning  the  fusion  junction  per  million  mate 
pairs  passing  filter. 

RNA  Chimera  Analysis.  Chimeras  found  from  UHR,  HBR,  VCaP,  and  K562  were 
grouped  based  on  whether  they  showed  expression  in  all  samples,  "broadly 
expressed,"  or  a  single  sample,  "restricted  expression."  Because  UHR  is  comprised 
of  K562,  chimeras  found  in  only  these  2  samples  were  also  considered  as  re¬ 
stricted.  Heatmap  visualization  was  conducted  by  using  TIGR's  MultiExperiment 
Viewer  (TMeV)  version  4.0  (www.tm4.org). 

Additional  Details.  Additional  details  can  be  found  in  SI  Text 
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Abstract 

RhoC  protein,  a  known  marker  of  metastases  in  aggressive 
breast  cancers  and  melanoma,  has  also  been  found  to 
be  overexpressed  in  certain  head  and  neck  cancers, 
thus  we  investigated  the  correlation  between  RhoC 
expression  and  the  metastatic  behavior  of  head  and  neck 
squamous  cell  carcinoma.  Selective  inhibition  of  RhoC 
expression  was  achieved  using  lentiviral  small  hairpin  RNA 
(shRNA)  transduced  and  tracked  with  green  fluorescent 
protein  to  achieve  70%  to  80%  RhoC  inhibition. 
Fluorescence  microscopy  of  the  RhoC  knockdown  stable 
clones  showed  strong  green  fluorescence  in  the 
majority  of  cells,  signifying  a  high  efficiency  of 
transduction.  Importantly,  quantitative  real-time  PCR 
showed  no  significant  decrease  in  the  mRNA  expression 
levels  of  other  members  of  the  Ras  superfamily.  Cell 
motility  and  invasion  were  markedly  diminished  in 
RhoC-depleted  cell  lines  as  compared  with  control 
transduced  lines.  H&E  staining  of  lung  tissue  obtained  from 
severe  combined  immunodeficiency  mice,  which  had 
been  implanted  with  RhoC  knockdown  cells,  showed  a 
marked  decrease  in  lung  metastasis  and  inflammation  of 
the  blood  vessels.  The  cultured  lung  tissue  showed  a 
significant  decrease  in  cell  growth  in  mice  implanted 
with  RhoC-depleted  cell  lines  as  compared  with 
shRNA-scrambled  sequence  control  lines.  Microscopic 
studies  of  CD31  expression  revealed  substantial 
quantitative  and  qualitative  differences  in  the  primary  tumor 
microvessel  density  as  compared  with  parental  and 
shRNA-scrambled  controls.  This  study  is  the  first  of  its 
kind  to  establish  the  involvement  of  RhoC  specifically  in 
head  and  neck  metastasis.  These  findings  suggest  that 
RhoC  warrants  further  investigation  to  delineate  its 
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robustness  as  a  novel  potentially  therapeutic  target.  (Mol 
Cancer  Res  2009;7(1 1 ):  1 771  -80) 

Introduction 

Head  and  neck  cancer  is  the  sixth  most  common  cancer 
worldwide  (1).  According  to  the  statistical  report  of  the  Amer¬ 
ican  Cancer  Society,  ~ 70,000  new  head  and  neck  squamous 
cell  carcinomas  (HNSCC)  will  be  diagnosed  this  year  in  the 
United  States  (2).  In  contrast  to  other  epithelial  cancers  for 
which  effective  screening  exists,  most  of  the  patients  with  head 
and  neck  cancer  are  diagnosed  at  a  very  late  stage  (stage  III  and 
IV).  Despite  advancements  in  surgical  procedures,  chemother¬ 
apy,  and  radiation  therapy,  survival  rates  have  not  improved  in 
the  last  several  decades  (3).  Furthermore,  it  has  been  shown 
that  the  high  rate  of  morbidity  is  due  to  both  locoregional  re¬ 
currence  and  distant  metastases. 

In  the  past  decade,  numerous  studies  have  shown  that  the 
Rho  family  of  GTPases  (RhoA,  RhoB,  RhoC,  Racl,  Rac2, 
Rac3,  and  CDC42)  is  involved  in  instilling  a  metastatic  pheno¬ 
type  into  localized  cancerous  cells  that  are  localized  to  the  or¬ 
gan  of  origin.  RhoA  and  RhoC  are  overexpressed  in  a  number 
of  tumor  types  (4,  5)  suggesting  an  oncogenic  role.  Among  the 
members  of  the  Ras  homology  protein  family,  RhoC  (molecular 
mass  of  2 1  kDa)  has  been  implicated  in  a  wide  range  of  cellular 
activities,  including  downstream  expression  of  inflammatory 
genes  and  chemokines,  cell  proliferation,  intracellular  signal¬ 
ing,  and  cytoskeletal  organization  (6).  More  significantly, 
RhoC  plays  a  central  role  in  assembling  focal  adhesion  by  mod¬ 
ulating  the  orientation  of  cytoskeletal  fibers,  resulting  in  cell 
polarity,  increased  cell  motility,  and  consequently,  increased  in¬ 
vasiveness  (7-9).  In  addition,  signaling  mediated  by  Rho  pro¬ 
teins  through  Rho-activating  kinase  regulate  proteins  that  in 
turn  regulate  actin  polymerization  such  as  cofilin,  profilin, 
and  formin  homology  proteins  (10).  Interestingly,  high  levels 
of  RhoC  and  Rho-activating  kinase  are  also  associated  with 
membrane  blebbing,  a  phenomenon  that  is  observed  in  motile 
or  invasive  cells  (10,  11). 

RhoC  overexpression  is  now  well  documented  in  a  wide 
range  of  malignant  cancers,  suggesting  an  important  role  in 
changing  noninvasive  carcinomas  into  invasive  forms.  Inter¬ 
estingly,  overexpression  of  RhoC  has  been  reported  in  inflam¬ 
matory  breast  cancer  and  exclusively  in  invasive  breast 
carcinoma  (12-15).  Other  tumor  types  in  which  overexpres¬ 
sion  of  RhoC  has  been  reported  are  ovarian  carcinoma  (16), 
esophageal  squamous  cell  carcinoma  (17),  pancreatic  cancer 
(18),  gastric  cancer  (17,  19),  and  human  melanoma  (11,  20). 
In  addition,  functional  studies  have  shown  that  RhoC  can  act 
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as  a  transforming  oncogene  when  it  is  overexpressed  in  hu¬ 
man  mammary  epithelia  converting  these  normally  immobile 
cells  into  highly  motile  and  invasive  malignant  cells  (12,  21). 
Thus,  a  wide  range  of  current  studies  reveal  the  important  role 
of  RhoC  in  cancer  metastasis. 

However,  very  few  studies  to  date  have  investigated  the 
role  of  RhoC  in  head  and  neck  cancer.  Studies  on  gene  ex¬ 
pression  profiling  of  stage  III  and  IV  regionally  metastatic 
HNSCC  showed  that  there  are  elevated  levels  of  RhoC  when 
compared  with  stage  I  and  II  localized  malignancy  (22).  Fur¬ 
thermore,  in  our  laboratory,  we  have  shown  that  RhoC  expres¬ 
sion  is  elevated  in  the  tumors  of  patients  with  HNSCC  when 
compared  with  normal  squamous  cell  epithelium  (21).  More 
importantly,  our  study  showed  that  increased  RhoC  expression 
is  strongly  associated  with  lymph  node  metastasis  and  could 
also  be  used  to  predict  metastasis  even  in  small  (Ti  and  T2) 
primary  tumors  (23).  In  the  present  study,  we  investigated  the 
role  of  RhoC  in  head  and  neck  metastasis  by  inhibiting  its 
function  using  RNA  interference.  Our  in  vitro  findings  deter¬ 
mined  that  inhibiting  RhoC  function  strongly  reduced  cell 
motility  and  invasion.  Furthermore,  we  observed  a  remarkable 
reduction  in  tumor  metastasis  and  microvessel  density  in  se¬ 
vere  combined  immunodeficiency  (SCID)  mice  injected  with 
RhoC  knockdown  cell  lines.  These  findings  suggest  that  inhi¬ 
bition  of  RhoC  function  in  HNSCC  can  diminish  a  tumor’s 
aggressive  behavior,  thus  opening  new  possibilities  for  future 
drug  therapies  targeting  this  pathway. 

Results 

RhoC  mRNA  Expression  Is  Greatly  Reduced  in 
Knockdown  Clones  from  HNSCC  Cell  Lines 

To  understand  the  role  of  RhoC  expression  in  head  and  neck 
metastasis,  we  constructed  cellular  reagents  in  which  RhoC  ex¬ 
pression  was  downregulated  by  small  hairpin  RNA  (shRNA)  in 
squamous  cell  carcinoma  cell  lines  from  the  University  of  Mi¬ 
chigan  (UM-SCC-11A  and  UM-SCC-1).  These  cells  exhibit  a 
strong  invasive  phenotype  and  have  shown  in  our  previous 
studies  that  RhoC  is  constitutively  active  in  these  lines  (23). 

The  inhibition  of  RhoC  expression  was  achieved  using  RNA 
interference  and  lentiviral  transfection  and  transduction  tech¬ 
nology.  After  lentiviral  infection,  positive  (stable)  clones  were 
selected  using  puromycin  antibiotics.  Fluorescence  microscopy 
of  the  stable  clones  showed  a  strong  green  fluorescence  in  the 
majority  of  the  cells,  signifying  a  high  efficiency  of  transfection 
(Fig.  1). 

We  then  tested  the  effectiveness  of  shRNA  in  depleting 
RhoC  mRNA  expression  in  the  lentivirally  infected  cell  lines 
using  quantitative  real-time  PCR  (qRT-PCR).  Because  only  a 
small  number  of  specific  gene  sequences  are  capable  of  ac¬ 
tivating  the  RNA  degradation  pathway,  we  used  two  different 
RhoC  knockdown  clones  (i.e.,  Cl  and  C2  along  with  a  pa¬ 
rental  and  shRNA-scrambled  sequence  infected  control)  to 
ensure  the  effectiveness  of  depleting  levels  of  RhoC.  The  re¬ 
sults  show  greatly  reduced  expression  levels  of  RhoC  gene 
in  the  Cl  and  C2  RhoC  knockdown  clones,  whereas  normal 
RhoC  expression  was  observed  in  clones  with  a  shRNA- 
scrambled  sequence  (Fig.  2).  The  relative  RhoC  mRNA  ex¬ 
pression  in  parental,  shRNA-scrambled  control  and  RhoC 


knockdown  clones  1  and  2  were  evaluated  by  qRT-PCR 
and  the  CT  values  thus  obtained  were  normalized  using 
two  housekeeping  genes  as  described  in  Materials  and  Meth¬ 
ods.  As  shown  in  Fig.  2A,  RhoC  mRNA  expression  de¬ 
creased  ~75%  and  80%  in  RhoC  knockdown  clone  1  and 
clone  2  of  UM-SCC-1,  respectively.  A  similar  decrease  of 
40%  and  70%  in  RhoC  mRNA  levels  was  observed  in  RhoC 
knockdown  clone  1  and  clone  2  of  UM-SCC-1 1  A,  respec¬ 
tively.  However,  the  control  shRNA-scrambled  sequence  in 
either  of  the  cell  lines  did  not  show  any  significant  reduction 
in  RhoC  mRNA  expression  level  (Fig.  2 A).  To  confirm  that 
only  RhoC  expression  was  being  inhibited,  the  mRNA  levels 
of  other  Rho  family  members,  Cdc42,  Racl,  and  Rac2  were 
also  analyzed  by  qRT-PCR.  As  shown  in  Fig.  2B,  C,  and  D, 
the  expression  levels  of  Cdc42,  Racl,  and  Rac2  are  very 
similar  to  the  parental  lines,  thus  confirming  that  our  shRNA 
process  is  highly  specific  to  RhoC  only.  These  studies  pro¬ 
vided  a  clear  insight  about  the  “switching  off’  of  the  RhoC 
machinery  by  decreasing  total  levels  of  RhoC  mRNA  expres¬ 
sion,  and  therefore,  further  detailed  studies  on  its  functional 
roles  are  defensible.  One  of  the  most  basic  clinical  questions 
that  arise  at  this  point  is  how  inhibition  of  the  RhoC  tran¬ 
script  affects  metastasis  in  head  and  neck  cancer.  To  address 
this  question,  we  investigated  two  characteristic  behaviors  of 
metastatic  cells,  invasion  and  motility  in  the  transduced  cell 
lines. 

RhoC  Knockdown  Clones  Show  a  Decrease  in  Cell 
Invasion  and  Motility 

In  the  invasion  assays,  RhoC-depleted  clones  of  UM-SCC- 
11A  and  UM-SCC-1  were  remarkably  less  invasive  and  motile 
compared  with  the  parental  or  shRNA-scrambled  controls 
(Fig.  3).  Notably,  cell  invasion  was  decreased  by  50%  and 
75%  in  RhoC  knockdown  clones  1  and  2,  respectively,  in  the 
transduced  UM-SCC-1 1 A  cell  line  (Fig.  31).  A  similar  decrease 
of  60%  and  80%  in  clones  1  and  2,  respectively,  was  observed  in 
UM-SCC-1  lines  (Fig.  3  J)  when  compared  with  their  parental  or 
shRNA-scrambled  controls  (n  =  3;  P  <  0.003). 

We  hypothesized  that  RhoC  plays  an  important  role  in  cell 
motility  in  HNSCC.  We  therefore  investigated  the  effect  of 
RhoC  on  cell  motility  using  the  scratch  model.  A  noticeable 
decrease  in  cell  motility  was  observed  in  RhoC  knockdown 
clones  as  compared  with  the  parental  or  shRNA-scrambled 
sequence  control  lines  (Fig.  4A  and  B;  n  =  3,  P  <  0.005). 
These  in  vitro  assays  provide  evidence  for  the  first  time  that 
RhoC  plays  an  important  role  in  cell  invasion  and  motility 
and  suggest  that  RhoC  is  important  for  metastasis  in  head 
and  neck  cancer. 

RhoC  Plays  an  Important  Role  in  Lung  Metastasis  and 
Microvessel  Density  Formation 

Besides  localized  tumor  growth,  lung  metastasis  is  a  com¬ 
mon  and  frequent  occurrence  in  patients  with  head  and  neck 
cancers  (24).  Keeping  this  aspect  in  view,  we  designed  an 
in  vivo  study  in  which  we  could  analyze  the  effect  of  RhoC 
inhibition  on  lung  metastasis  and  primary  tumor  vascularity. 
This  was  achieved  by  injecting  transduced  cell  lines  through 
the  tail  veins  of  SCID  mice  and  analyzing  them  for  lung 
metastasis.  Because  both  clones  gave  similar  results  in  our 
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cell  motility  and  invasion  assays,  we  selected  RhoC  knock¬ 
down  clone  2  for  our  subsequent  in  vivo  studies  and  all  results 
discussed  hereafter  are  based  on  this  clone.  The  xenograft 
mice  were  sacrificed  2  weeks  after  implantation  and  their 


lungs  were  analyzed  for  metastasis  using  H&E  stain.  As 
shown  in  Fig.  5A  and  B,  in  the  mice  injected  with  UM- 
SCC-11A  parental  or  shRNA-scrambled  control,  a  large  meta¬ 
static  focus  and  inflamed  blood  vessels  were  observed  in  the 
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FIGURE  1.  Lentivirus-infected  cells  showing  GFP  expression  levels.  A.  UM-SCC-1 1 A  cell  line  transfected  with  shRNA-scrambled  sequence  control  (SR), 
RhoC  knockdown  clone  1  (Cl),  RhoC  knockdown  clone  2  (C2),  and  uninfected  cells  as  controls  (negative).  Histograms  obtained  by  flow  cytometry  (top),  and 
GFP-labeled  cells  in  fluorescent  (middle)  and  bright  lights  (bottom).  B.  UM-SCC-1.  All  other  notations  are  the  same  as  described  above.  As  shown  by  the 
GFP  expression  levels,  a  high  number  of  cells  (80-90%)  were  successfully  infected  with  recombinant  lentivirus. 
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FIGURE  2.  qRT-PCR  of  cell  lines  UM-SCC-1  and  UM-SCC-11A  showing  the  relative  mRNA  expression  levels  of  RhoC  (A),  Cdc42  (B),  Racl  (C),  and 
Rac2  (D)  in  parental  (control),  shRNA-scrambled  sequence  control,  and  RhoC  knockdown  clones  1  and  2  after  selection  and  establishment  of  positive 
clones.  Results  were  analyzed  using  2-AACT  methods.  A  significant  decrease  in  mRNA  levels  of  RhoC  knockdown  clones  were  obtained  whereas  the  ex¬ 
pression  of  Cdc42,  Racl,  and  Rac2  remained  unchanged  (P<  0.05). 


lung  region  (marked  by  arrows).  A  similar  set  of  results 
were  obtained  for  the  UM-SCC-1  parental  and  shRNA- 
scrambled  equence  control  (Fig.  5D  and  F).  In  contrast,  mice 
injected  with  RhoC  knockdown  clone  have  very  small  meta¬ 
static  tissue  with  barely  visible  patches  of  inflamed  blood  ves¬ 
sels  in  UM-SCC-1 1 A  and  UM-SCC-1,  respectively  (Fig.  5C 
and  F). 

In  addition,  the  remaining  dissected  lung  tissues  were  cul¬ 
tured  for  observation  of  cell  growth  by  the  metastatic  tumors. 
The  bar  graph  shows  the  number  of  cancer  cells  grown  in  di¬ 
gested  lung  of  mice  which  includes  parental,  shRNA-scrambled 
control  and  RhoC  knockdown  clones.  Interestingly,  there  is  a 
67%  and  58%  decrease  in  cell  number  in  RhoC  knockdown 
clones  of  UM-SCC-1 1A  and  UM-SCC-1,  respectively,  when 
compared  with  their  parental  lines  (Fig.  5G  and  H).  These  re¬ 
sults  strongly  suggest  that  inhibition  of  RhoC  expression  great¬ 
ly  reduces  metastasis  in  vivo. 

Furthermore,  to  test  the  angiogenic  role  of  RhoC,  parental 
and  RhoC  knockdown  cells  were  implanted  in  the  flank  re¬ 
gion  of  the  SCID  mice.  Microvessel  density  of  the  localized 
solid  primary  tumor,  which  grows  into  a  sizable  volume  after 
12  weeks  of  implantation  in  the  flank  region,  was  analyzed 
using  CD31  antibody.  Microscopic  analysis  of  the  CD31- 
stained  tumor  revealed  a  remarkable  difference  in  microvessel 
formation  in  mice  implanted  with  RhoC  knockdown  clones 
when  compared  with  the  corresponding  either  parental  or 
shRNA-scrambled  control.  In  the  control  groups,  well  devel¬ 


oped  microvessels  were  observed  in  the  tumors,  which  was  in 
strong  contrast  with  the  poorly  developed  microvessels  in 
mice  implanted  with  RhoC  knockdown  clone  (Fig.  6).  Our 
results  are  in  coherence  with  the  published  work  about  the 
essential  role  of  RhoC  in  angiogenesis  (25,  26).  A  similar  pat¬ 
tern  of  microvessel  development  was  observed  in  UM-SCC- 
11A  parental,  shRNA-scrambled,  and  RhoC  knockdown  clone 
(data  not  shown).  These  results  suggest  that  RhoC  is  required 
for  proper  formation  of  the  vascular  network  in  a  developing 
tumor. 


Discussion 

Tumor  metastasis  is  well  correlated  with  the  overexpres¬ 
sion  of  certain  oncogenes.  The  overexpression  of  the  Rho 
gene  family  has  been  reported  in  many  malignant  forms  of 
cancer  (27),  including  pancreatic  cancer  (18),  gastric  cancer 
(17,  19),  and  human  melanoma  (11,  20).  However,  there 
have  been  very  few  studies  on  whether  overexpression  of 
RhoC  is  involved  in  head  and  neck  tumor  metastasis.  Previ¬ 
ous  studies  in  our  laboratories  have  shown  that  RhoC  is  ac¬ 
tively  expressed  in  several  well  established  UM-SCC  cell 
lines.  Among  the  cell  lines  tested,  the  UM-SCC-1 1 A  and 
UM-SCC-1  lines  exhibited  considerably  high  levels  of  RhoC- 
protein  (23).  In  particular,  the  active  form  of  RhoC  (RhoC 
GTPase)  was  observed  to  be  constitutively  expressed  in  the 
UM-SCC  lines.  Therefore,  for  our  current  study,  we  selected 
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FIGURE  3.  Cell  invasion  assay  of  UM-SCC-1 1 A  and  UM-SCC-1  lines  transfected  with  RhoC  shRNA.  A  and  E.  Parental  cell  lines;  B  and  F.  shRNA- 
scrambled  controls;  C  and  G.  RhoC  knockdown  clone  1;  D  and  H.  RhoC  knockdown  clone  2  of  UM-SCC-1 1 A  and  UM-SCC-1,  respectively  (magnification, 
x40  and  xlOO).  I  and  J.  Columns,  rates  of  invasion;  bars,  95%  Cl  (P<  0.05). 
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FIGURE  4.  The  effect  of  RhoC  knockdown  on  cell  motility.  A  and  B.  The  slow  movement  of  RhoC  knockdown  cells  (after  24  h)  as  compared  with  its 
parental  or  shRNA-scrambled  control  in  UM-SCC-11A  and  UM-SCC-1,  respectively  (magnification,  x40).  Columns,  percentage  of  motility  with  the  initial 
reference  point  as  0  h  (P  <  0.05). 


two  UM-SCC  lines  (UM-SCC-1 1 A  and  UM-SCC-1)  to  eval¬ 
uate  the  role  of  RhoC  in  HNSCC  metastasis.  Our  first  and 
foremost  aim  was  to  inhibit  RhoC  expression  in  the  two  se¬ 
lected  cell  lines  and  analyze  its  function  in  vitro.  Our  expec¬ 
tation  was  that  the  motility  and  invasion  would  be  greatly 
reduced  in  RhoC-depleted  cell  lines  as  compared  with  paren¬ 
tal  lines.  In  this  study,  we  have  shown  a  successful  inhibition 
of  RhoC  gene  expression  and,  subsequently,  its  function  using 
shRNA  techniques  (Fig.  2).  Furthermore,  our  data  show  that 


cell  invasiveness  and  motility  which  are  characteristics  of  ag¬ 
gressive  head  and  neck  cancer  cell  lines  were  diminished 
when  RhoC  expression  was  inhibited  (Figs.  3  and  4).  There¬ 
fore,  these  results  suggest  that  RhoC  overexpression  drives 
cell  invasion  and  motility  in  HNSCC.  It  is  reported  that  one 
of  the  major  functions  of  the  Rho  family  of  proteins  is  to  con¬ 
trol  cytoskeletal  organization  (28).  Cytoskeletal  proteins  are 
involved  predominantly  in  cell  motility.  Therefore,  RhoC 
may  control  metastasis  by  modulating  cell  motility  (29).  To 
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facilitate  the  movement  of  cells,  they  need  to  turn  over  both 
cell-extracellular  matrices  and  cell  to  cell  adhesions,  which 
includes  both  adherence  junctions  and  tight  junctions 
(30,  31).  It  has  also  been  reported  that  RhoC  plays  a  predom¬ 
inant  role  over  RhoA  in  the  weakening  of  adherence  junc¬ 
tions,  which  is  an  important  step  towards  transforming  cells 
into  an  invasive  phenotype  (6).  These  studies  therefore,  raise 
the  question  as  to  what  effect  RhoC  inhibition  would  create 
in  vivo.  Our  in  vivo  results  showed  that  both  inflamed  blood 
vessels  of  lungs  and  a  large  volume  of  lung  metastases  were 
present  in  animals  which  were  administrated  by  tail  vein 
injection  of  either  parental  or  shRNA-scrambled  sequence 
(control)  cell  lines.  In  contrast,  the  lungs  of  mice  implanted 
or  injected  with  RhoC  knockdown  lines  were  free  from  any 
pathologic  findings,  specifically  very  minimal  lung  metastases 
and  very  low  level  of  inflammation  in  lung  tissues  and  blood 
vessels  (Fig.  5).  Furthermore,  the  level  of  angiogenesis  in  the 
localized  tumors  was  assessed  using  CD3 1  antibody  and  these 
results  showed  a  remarkable  difference  both  in  quality  as  well 
quantity  of  the  microvessels  in  the  tumors.  The  mice  im¬ 
planted  with  RhoC  knockdown  lines  showed  markedly  fewer 
and  less  poorly  developed  microvessels  as  compared  with  the 
far  greater  in  number  and  clearly  defined  vessels  in  parental 
or  shRNA-control  cell  lines  (Fig.  6). 

The  implications  of  the  findings  in  this  study  provide  a 
fertile  area  of  research  in  HNSCC.  For  instance,  recent  work 
has  shown  that  matrix  metalloproteinases  (MMP),  which  are 
well-known  mediators  of  invasive  tumor  behavior,  have  been 
identified  as  a  specific  and  critical  player  for  the  formation 
of  lung  metastasis  (32,  33).  Li  et  al.  reported  that  the  onco¬ 
gene,  AF1Q ,  which  is  responsible  for  primary  breast  tumor 
growth  and  pulmonary  metastasis  are  at  least,  in  part,  regu¬ 
lates  other  MMPs  and  RhoC  expression  (34).  The  remodeling 
of  the  actin  cytoskeleton  is  a  critical  and  important  step  in  the 


formation  of  pulmonary  metastasis  due  to  changes  in  cell  shape, 
polarity,  cell  interactions,  and  eventual  migration  of  the  cancer 
cells.  Interestingly,  studies  by  Nelson  et  al.  (35)  have  shown  that 
expression  of  MMP  3  gene  which  induces  epithelial-mesenchymal 
transition  in  mammary  epithelial  cells  is  brought  about  by 
change  in  cell  shape  through  Racl  (also  a  member  of  the  Rho 
family)  mediated  changes  in  cytoskeletal  structure.  Clearly, 
future  studies  elucidating  the  specific  interactions  between 
MMP2,  MMP3,  and  MMP9  (major  MMP  proteins  in  HNSCC) 
and  RhoC  are  indicated,  and  may  prove  to  be  one  of  the  signal¬ 
ing  pathways  for  RhoC-mediated  function. 

In  conclusion,  the  findings  presented  in  this  study  illustrate 
that  in  both  in  vivo  and  in  vitro  conditions,  RhoC  plays  an  im¬ 
portant  role  in  head  and  neck  cancer  progression  and  metasta¬ 
sis.  With  additional  investigations  and  ongoing  development  of 
RhoC  specific  inhibitors,  this  may  prove  to  be  an  important 
therapeutic  target  in  this  patient  population. 

Materials  and  Methods 

Cell  Culture  and  Generation  of  Stable  RhoC  Knockdown 
Clones 

UM-SCC-1 1A  and  UM-SCC-1  are  well  established  cell 
lines  derived,  respectively,  from  a  65-y-old  patient  with  a 
T2N2a  of  the  epiglottis  and  from  a  46-y-old  patient  with  T2N0 
of  the  false  vocal  cord  (36,  37).  These  cell  lines  were  grown  at 
37°C  in  a  humidified  atmosphere  with  95%  air-5%  C02.  The 
cultures  were  maintained  in  DMEM  (Life  Technologies)  con¬ 
taining  10%  heat- inactivated  fetal  bovine  serum  (FBS;  Hy- 
clone)  and  supplemented  with  50  pg/mL  of  penicillin  G  and 
50  pg/mL  of  streptomycin  sulfate. 

RhoC  knockdown  and  scrambled  sequence  constructs  with 
green  fluorescence  protein  (GFP)  tag  and  puromycin  resis¬ 
tance  sites  were  synthesized  by  the  vector  core  facility  of 
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FIGURE  5.  The  effect  of  RhoC  knockdown  on  lung  metastasis  in  SCID  mice  injected  through  the  tail  vein  with  UM-SCC-1 1 A  and  UM-SCC-1  cell  lines 
transfected  with  RhoC  shRNA.  The  lung  sections  were  stained  with  H&E  dye  to  show  the  degree  of  metastasis.  A  and  D.  Parental;  B  and  E.  shRNA- 
scrambled  controls;  C  and  F.  RhoC  knockdown  clone  2.  Black  arrows,  inflamed  blood  vessels  present  only  in  parental  and  scrambled-sequence  controls 
(magnification,  xlOO).  G  and  H.  Number  of  cells  obtained  by  culturing  the  lungs  for  UMS-CC-1 1 A  and  UM-SCC-1 ,  respectively,  showing  a  marked  reduction 
in  RhoC  knockdown  clone  2  (P  <  0.05). 


Parental  shRNA-  scrambled 


Clone  2 


cza  Parental 

£23  sh  R  N  A-scram  bled 

=3  Clone  2 


20000- 


t*  15000 


■  Parental 

■  shRNA-scrambled 
a  Clone  2 


Mol  Cancer  Res  2009;7(11).  November  2009 


1778  Islam  et  al. 


FIGURE  6.  Assessment  of  microvessel  density  in  UM-SCC-1  lines 
using  immunostaining  with  CD31  antibody.  A  and  B.  Parental  and 
shRNA-scrambled  sequence  with  well-developed  microvessels.  C. 
RhoC  knockdown  clone  2  with  much  smaller  and  poorly  developed 
microvessels  (magnification,  x40). 

the  University  of  Michigan.5  The  sequences  used  for  RhoC 
constructs  are  available  in  Open  Biosystems6  and  include 
oligo  ID  V2LHS_69446  and  V2LHS_69410,  accession  num¬ 
ber  NM_00 1042678.  The  sequences  of  the  constructs  are 
69446  =  5  '-ATACTGTCTTTGAGAACTATAT  (sense;  for 
RhoC  knockdown  clone  1)  and  69410  =  5'-CACCAG- 
CACTTTATACACTTC  (sense;  for  RhoC  knockdown  clone 
2).  The  sequence  of  shRNA  miR  nonsilencing  (scrambled) 
control  is  ATCTCGCTTGGGCGAGAGTAAGTGCTGTT- 
GACAGTAAGCGATCTCGCTTGGGCGAGAGTAAG- 
TAGTGAAGCCACAGATGTACTTACTCTCGCCCAGCGA- 
GAGTGCCTACTGCCTCGGA.  This  control  sequence  does 
not  match  any  known  mammalian  genes  (the  sequence  had 
at  least  three  or  more  mismatches  against  any  gene  which 
was  determined  via  nucleotide  alignment/BLAST  of  target 


5  http://www.med.umich.edu/vcore 

6  http://www.openbiosystems.com/ 


22mer  sequence).  This  is  the  nonsilencing  shRNAmir  hairpin 
sequence  found  in  the  pSM2,  pSMP,  pGIPZ,  pTRIPZ,  and 
pLemiR  nonsilencing  controls. 

293FT  cells  (Invitrogen)  were  infected  with  250  mmol/L  of 
CaCl2  solution  containing  RhoC  shRNA  construct,  25  pmol/L 
chloroquine  and  viral  particles  (i.e.,  Gag,  Pol,  and  Env)  and 
grown  overnight.  The  medium  was  changed  after  12  h  to  re¬ 
move  chloroquine  and  fresh  DMEM-10%  FBS  was  added  to 
the  growing  293FT  cells  to  produce  the  virus.  The  supernatants 
from  the  infected  cells  were  collected  and  1  mL  of  this  solution 
was  added  to  growing  UM-SCC-1 1 A  and  UM-SCC-1  lines. 
Cells  were  incubated  at  37°C  and  the  GFP  expression  was 
monitored  after  48  h  of  infection.  Positive  (stable)  clones  were 
selected  using  puromycin  antibiotic  (1.6  and  2.0  pg/mL  for 
UM-SCC-1 1 A  and  UM-SCC-1,  respectively).  These  were  then 
analyzed  using  fluorescence  microscopy  which  showed  a 
strong  green  fluorescence  in  the  majority  of  the  cells,  signifying 
a  high  efficiency  of  infection  (Fig.  1A  and  B).  Furthermore, 
flow  cytometry  analyses  showed  that  the  number  of  non- 
infected  cells  were  significantly  low  (Fig.  1A  and  B). 

Flow  Cytometry  Analyses 

Approximately  70%  to  80%  confluent  lentivirus-infected 
cells  were  harvested  using  trypsin-EDTA  solution  and  resus¬ 
pended  in  phosphate  buffer  saline  containing  3%  FBS, 
0.5  mm  EDTA,  and  60  units/mL  of  DNase.  Flow  cytometry 
analysis  was  done  using  a  BD  FACS  Aria  IIU  flow  cytometer 
equipped  with  a  488  nm,  15  mW,  air-cooled  argon  laser  (Ana¬ 
lytical  Cytometry  Laboratory,  Ohio  State  University  Compre¬ 
hensive  Cancer  Centre).  GFP-positive  cells  were  sorted  out 
and  grown  for  subsequent  experiments. 

qRT-PCRs 

Total  RNA  was  isolated  according  to  the  standard  procedure 
using  TRIzol  reagent  (Invitrogen).  qRT-PCR  were  conducted 
with  a  TaqMan  probe  system  from  Applied  Biosystems  by  us¬ 
ing  the  following  products:  cdc42,  Hs03044122_gl;  Racl, 
Hs01025984_ml;  Rac2,  Hs01032884_ml ;  and  RhoC, 
Hs00733980_ml.  (3-Actin  and  G3PDH  were  used  as  the  data 
normalizers.  Relative  changes  in  gene  expression  were  calculat¬ 
ed  using  the  2~AACT  method  (38). 

Cell  Invasion  and  Motility  Assay 

Invasion  Assay.  Cell  invasion  assays  were  done  using  BD 
BioCoat  Matrigel  Invasion  Chamber  which  was  obtained  from 
BD  Biosciences.  The  procedure  was  followed  according  to  the 
instructions  of  the  manufacturer.  Briefly,  ~2.5  x  105  cells  in 
2  mL  of  serum-free  DMEM  were  added  at  the  top  of  the  insert 
and  1  mL  of  the  medium  was  added  in  the  bottom  well  of  each 
insert.  FBS  albumin  was  added  to  the  medium  in  the  lower 
chamber  (final  concentration  of  FBS  was  10%,  v/v),  which 
acted  as  a  chemoattractant.  Cells  were  incubated  for  22  h  in 
a  humidified  cell  culture  incubator  at  37°C,  5%  C02  atmo¬ 
sphere.  Next,  the  noninvading  cells  at  the  top  of  the  insert  were 
scraped  out  with  the  help  of  cotton-tipped  swab.  The  invading 
cells  which  were  attached  to  the  underside  of  the  membrane 
were  fixed  in  100%  methanol  and  stained  with  1%  Toluidine 
prepared  in  100%  methanol.  After  repeated  washing  of  the 
membrane  using  distilled  water,  stained  cells  were  allowed  to 
air-dry  at  room  temperature  before  it  was  visualized  under  a 
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microscope.  A  parallel  experiment  with  control  inserts  (without 
Matrigel)  was  also  run.  Matrigel-invaded  cells  were  counted 
microscopically  at  40x  and  100x  magnifications. 

Motility  Assay.  Cell  motility  assays  were  done  in  100  mm 
Petri  dishes.  At  ~80%  confluence,  cells  were  washed  with 
PBS  and  a  fine  scratch  in  the  form  of  a  groove  was  made  with 
the  help  of  a  sterile  pipette  tip  and  immediately  photographed. 
We  designated  this  time  as  the  0  h.  Next,  cells  were  supple¬ 
mented  with  DMEM  containing  10%  FBS  and  allowed  to 
grow.  A  migration  of  cells  from  the  edge  of  the  groove  to¬ 
wards  the  center  was  monitored  microscopically  at  40x  mag¬ 
nifications  after  24  h  to  assess  the  extent  of  scratched  area 
covered.  The  width  of  the  scratch  was  measured  at  0  h  and 
after  24  h  to  calculate  the  percentage  of  the  gape  covered  by 
the  cells  in  a  24-h  time  period. 

Animal  Xenograft 

Athymic  SCID  mice  were  obtained  from  the  Jackson  Labo¬ 
ratory;  6-wk-old  mice  were  housed  in  cages  of  five  animals 
each.  Five  animals  per  treatment  were  selected  to  receive  paren¬ 
tal,  shRNA-scrambled  sequence  control  and  RhoC  knockdown 
clone,  resulting  in  15  animals  per  cell  line  for  each  set  of 
experiments.  Approximately  5  x  106  UM-SCC-11A  and 
UM-SCC-1  cells  were  suspended  in  100  pL  of  serum-free 
DMEM  and  injected  thorough  the  tail  vein  and/or  in  the  flank 
region  of  mice  using  a  0.5-inch,  27-gauge  needle.  Animals  were 
monitored  every  other  day  for  their  general  health  and  activities. 
At  the  end  of  the  second  week,  the  animals  were  euthanized  us¬ 
ing  a  C02  chamber.  The  lungs  were  dissected  and  half  of  the 
lungs  were  fixed  in  buffered  formalin  for  6  h,  and  thereafter 
transferred  to  70%  methanol  and  then  processed  to  form  paraf¬ 
fin-embedded  tissue  blocks  (H&E  staining  was  also  done).  The 
remaining  half  of  the  lungs  was  digested  in  collagenase  for  cul¬ 
turing  the  cells.  At  the  end  of  week  1 2,  tumors  in  the  flank  region 
were  fully  grown.  The  animals  were  euthanized  and  tumors  were 
dissected  and  fixed  in  the  same  way  as  described  above  for 
CD31  staining. 

Lung  Metastases 

Slides  of  5 -pm- thick  sections  of  lungs  were  prepared  and 
stained  with  H&E.  Five  random  fields  were  microscopically 
examined  in  a  blind  fashion  at  100x  magnification  to  detect 
metastases. 

Microvessel  Density 

Microvessel  density  in  all  primary  tumors  was  assessed  us¬ 
ing  antimouse  CD31  antibody  (PharMingen)  at  a  dilution  of 
1:250.  Five  random  low-power  fields  (40x  magnification)  were 
selected  to  visualize  the  microvessels.  The  mean  was  reported 
in  a  blind  fashion  for  each  tumor. 

Statistical  Analysis 

Statistical  analyses  were  done  using  Sigma  GraphPad  prism 
4  software.  The  mean  ±  SD  was  reported.  Differences  were 
considered  to  be  statistically  significant  at  P  <  0.05. 
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Genomic  translocations  leading  to  the  expression  of  chimeric  transcripts  characterize  several  hematologic, 
mesenchymal  and  epithelial  malignancies.  While  several  gene  fusions  have  been  linked  to  essential 
molecular  events  in  hematologic  malignancies,  the  identification  and  characterization  of  recurrent  chimeric 
transcripts  in  epithelial  cancers  has  been  limited.  However,  the  recent  discovery  of  the  recurrent  gene  fusions 
in  prostate  cancer  has  sparked  a  revitalization  of  the  quest  to  identify  novel  rearrangements  in  epithelial 
malignancies.  Here,  the  molecular  mechanisms  of  gene  fusions  that  drive  several  epithelial  cancers  and 
the  recent  technological  advances  that  increase  the  speed  and  reliability  of  recurrent  gene  fusion  discovery 
are  explored. 

©  2009  Elsevier  B.V.  All  rights  reserved. 
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1.  Introduction 

Throughout  history,  technological  advances  are  often  followed  by 
discoveries  that  dramatically  alter  our  perceptions  of  disease  etiology. 
For  example,  after  the  term  “chromosome”  was  introduced  in  the  mid- 
1849's,  several  German  pathologists  began  using  techniques  to 
compare  gross  mitotic  changes  in  tissue  sections  from  different 
human  malignancies  [1].  Almost  half  of  a  century  later,  Theodore 
Boveri  published  a  critical  hypothesis  that,  “mammalian  tumors  might 
be  initiated  by  mitotic  abnormalities  that  resulted  in  a  change  in  the 
number  of  chromosomes  in  the  cell  (aneuploidy)”,  based  on  the 
observation  that  sea  urchin  embryos  would  frequently  engage  in 
uncommon  development  following  mitotic  abnormality  [2].  As  time 
passed,  breakthroughs  arose  that  dramatically  increased  the  quality 
and  reproducibility  of  cytogenetic  techniques  such  as  the  use  of 
colchicine,  which  arrests  cells  in  mitosis  by  inhibiting  microtubule 
assembly.  As  a  result  of  these  observations,  the  general  hypotheses 
regarding  the  evolution  of  human  disease  became  increasingly 
complex;  particular  pathological  conditions  were  associated  with 
specific  chromosomal  abnormalities,  such  as  Lejeune's  association  of 
Down  syndrome  with  an  extra  copy  of  chromosome  21  [3,4]. 

Advances  in  technology  once  again  spurred  discovery  when,  in 
1958,  Rothfels  and  Siminovitch  published  a  new  cytogenetic,  air¬ 
drying  technique  for  flattening  chromosomes  [5].  The  application  of 
this  technology  later  allowed  Hungerford  and  Nowell  to  further 
characterize  their  initial  observation  that  two  patients  with  chronic 
myelogenous  leukemia  (CML)  had  a  characteristic  small  chromosome 
[6].  Soon  after  the  initial  publication,  Hungerford  and  Nowell  were 
able  to  report  on  a  series  of  seven  patients,  all  of  which  harbored  this 
minute  chromosome  [7].  This  was  coined  the  “Philadelphia  chromo¬ 
some”  after  the  city  in  which  the  abnormal  chromosome  was 
discovered  in  accord  with  the  Committee  for  the  Standardization  of 
Chromosomes  [8].  The  rearrangement  leading  to  the  Philadelphia 
chromosome  was  eventually  characterized  as  a  translocation  between 
chromosomes  9  and  22  [9],  resulting  in  the  fusion  of  the  breakpoint 
cluster  region  (BCR)  gene  on  chromosome  22  with  the  v-abl  Abelson 
murine  leukemia  viral  oncogene  homolog  (ABU)  gene  on  chromo¬ 
some  9  [10].  Later  in  1990,  Lugo  et  al.  demonstrated  that  the  BCR-ABL1 
fusion  protein  is  an  active  tyrosine  kinase,  through  immunoblotting 
cell  lysates  from  Rat  1  transfected  cells,  revealing  that  cells  transfected 
with  either  BCR-ABL1  or  v-src,  but  not  v-H-ras  or  v-myc,  had  a 
significant  increase  in  total  phosphotyrosine  content  [11].  Under¬ 
standing  the  molecular  mechanism  of  BCR-ABL1  led  to  the  develop¬ 
ment  of  one  of  the  first  molecularly  tailored  therapies  as  the  small 
molecule  Imatinib  was  specifically  selected  for  its  ability  to  inhibit 
BCR-ABL1  kinase  activity  [12,13].  The  success  of  treating  chronic 
myelogenous  leukemia  with  a  specific  inhibitor  of  the  BCR-ABL1 
chimera  led  to  a  strong  interest  in  the  discovery  of  novel  gene  fusions 
in  other  cancer  subtypes  with  the  long  term  goal  of  designing  disease 
specific  therapeutics. 

As  techniques  like  the  use  of  chromosome  banding  for  karyotypic 
analysis  were  improved,  the  impact  on  discovery  of  novel  gene  fusions 
was  immediately  evident  in  leukemias  and  lymphomas.  In  fact,  while 
BCR-ABL1  is  perhaps  the  most  famous  gene  fusion,  the  first 
molecularly  characterized  chimera  was  discovered  by  Zech  et  al. 
through  the  use  of  karyotypic  analysis  and  is  actually  involved  in  the 
pathogenesis  of  Burkett's  lymphoma  and  was  identified.  While 
this  karyotypic  analysis  demonstrated  absence  of  the  distal  region 
on  the  long  arm  of  chromosome  8  and  an  extra  band  in  the  long 
arm  chromosome  14  distal  segment  [14],  the  genes  involved  in  the 


rearrangement  remained  elusive  until  1982  when  it  was  demon¬ 
strated  that  the  translocation  altered  the  c-MYC  oncogene  [15]  and 
that  the  promoter  and  5'  region  of  the  immunoglobulin  heavy  chain 
(IGH)  gene  were  rearranged  such  that  the  IGH  promoter  controls  c- 
MYC  expression  [16].  Although  this  fusion  does  not  lead  to  a  chimeric 
protein,  it  was  demonstrated  that  aberrant  c-MYC  expression  through 
the  IGH  promoter  is  a  necessary  component  of  malignant  transforma¬ 
tion  in  Burkett's  lymphoma  [17]. 

As  with  lymphoma  research,  karyotypic  analysis  rapidly  led  to  the 
identification  of  recurrent  breakpoints  that  seemed  to  characterize 
subsets  of  myeloid  leukemia.  For  example,  in  1973,  the  acute  myeloid 
leukemia  1  (AMU )  gene  was  cloned  from  the  breakpoint  region  of  the 
first  recurrent  translocation  described  in  leukemia,  t(8;21)  [18].  In 
1991,  the  AMU  gene  was  found  to  be  fused  to  the  eight-twenty  one 
(ETO)  gene  on  chromosome  21,  which  is  also  known  as  runt-related 
transcription  factor  1  translocated  to  1  (RUNXITI)  [19,20]. 

As  the  techniques  of  molecular  biology  improved,  it  became  easier 
and  easier  to  obtain  the  DNA  sequence  adjacent  to  chromosomal 
breakpoints.  Since  the  original  identification  of  AMU  in  myeloid 
leukemia,  over  10  genes  have  been  described  to  participate  in 
rearrangements  with  AMU  [21].  In  fact,  advances  in  sequencing 
technology  led  to  the  realization  that  several  genes  are  recurrently  and 
promiscuously  fused  to  multiple  partners;  the  examples  of  which  are 
ever  increasing.  In  addition  to  AMU ,  the  other  notable  example  of  a 
promiscuous  fusion  gene  partner  is  the  mixed-lineage  leukemia  ( MIL ) 
gene,  which  is  involved  in  over  40  different  rearrangements  (reviewed 
in  [22]).  In  fact,  because  of  the  variety  and  difficulty  of  discussing  all 
chromosomal  aberrations  in  human  malignancies,  Mitelman  et  al. 
maintain  and  frequently  update  an  online  database  of  rearrangements 
and  chromosome  aberrations  from  all  malignant  neoplasms  [23]. 

With  the  rapid  development  of  current  technologies  like  high- 
throughput  sequencing,  our  perceptions  as  to  the  origins  of  disease 
have  revealed  a  critical  involvement  of  chromosomal  aberrations,  in 
particular,  the  role  of  translocations  and  gene  fusions  in  malignant 
development.  With  a  better  understanding  of  the  role  of  these 
chromosomal  aberrations,  therapies  designed  to  inhibit  the  molecular 
function  of  chimeric  proteins  have  recently  been  developed  and,  like 
Imatinib,  some  have  demonstrated  a  window  of  strong  efficacy. 
Consequently,  much  hope  has  been  generated  by  the  potential  for 
targeting  existing  and  novel  gene  fusions  that  characterize  specific 
cancer  subtypes  with  rationally  designed  molecularly  tailored 
therapies.  Here,  we  review  known  genomic  rearrangements  in 
epithelial  tumors  that  led  to  aberrant  expression  of  chimeric 
transcripts  and  the  emerging  technologies  that  may  lead  to  the 
identification  of  novel  gene  fusions. 

2.  Gene  fusions  in  epithelial  cancers 

In  order  to  highlight  the  number  of  genomic  rearrangements 
leading  to  fusion  genes  that  characterize  epithelial  cancers,  we  have 
surveyed  some  of  the  well-studied  chimeras  from  several  solid 
malignancies  and  describe  the  fusions  in  approximate  chronological 
order  (Fig.  1 ).  In  the  ensuing  sections,  we  will  analyze  concepts  from  a 
global  view  of  epithelial  gene  fusions  with  a  few  case  studies  of 
rearrangements  from  leukemia  and  endometrial  stromal  tumors. 
Gene  fusions  will  be  categorized  into  three  different  types:  (1)  those 
which  alter  the  transcriptional  regulation,  (2)  those  which  alter  mRNA 
regulation  and  (3)  those  which  alter  protein  activity.  This  will  be 
followed  by  a  discussion  of  the  potential  reasons  why  gene  fusions 
have  not  been  in  the  limelight  of  solid  tumor  pathogenesis  and  the 
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Fig.  1.  Chronology  of  gene  fusion  discoveries  in  epithelial  cancers. 


developing  technologies  that  are  being  used  to  find  novel  recurrent 
gene  fusions  in  common  epithelial  tumors. 

2.1.  RET-NTRIG 

The  initial  discovery  of  an  epithelial  gene  fusion  in  mid-1989 
comes  directly  from  a  novel  screening  technique  used  to  identify 
transforming  oncogenes.  In  this  experimental  approach,  immortalized 
NIH3T3  cells  were  transfected  with  fragments  of  tumor  cell  genomic 
DNA,  plated  in  soft  agar.  DNA  is  then  isolated  from  cells  and  sequenced 
or  sub-cloned  to  identify  critical  fragments.  Using  this  approach, 
Martin-Zanca  et  al.  identified  the  RET-NTRIG  genomic  translocation, 
providing  some  of  the  first  insights  into  the  possibility  that  recurrent 
genomic  rearrangements  were  not  specifically  of  hematologic  phe¬ 
nomena  [24]. 

RET  (rearranged  during  transfection)  encodes  a  tyrosine  kinase 
[25,26]  that  was  originally  identified  through  transfection  of  DNA 
from  a  human  T-cell  lymphoma  into  NIH3T3  cells  [27].  NTRK1  is  a 
membrane-bound  tyrosine  kinase  receptor  that  regulates  neuronal 
cell  growth,  differentiation,  and  programmed  cell  death  pathways 
[28].  Fusion  of  these  two  genes  results  in  loss  of  the  NTRK1  signal 
sequence  giving  rise  to  cytoplasmic  localization  and  constitutive 
activation  of  the  fusion  protein  [29].  Interestingly,  although  NTRIG 
was  the  first  identified  RET  fusion  partner,  RET  has  several  other  N- 
terminal  fusion  partners  including  H4  [30,31  ],R1a  [32],  RFG5  [33]  and 
ELE1  [34,35].  One  possible  explanation  for  the  diversity  of  genomic 
rearrangements  observed  in  PTC  is  that  the  underling  pathology  is 
simply  dependent  on  deregulation  of  either  the  RET  or  NTRIG  tyrosine 
kinase  domain  (reviewed  in  [36]).  Consequently,  the  important 
determining  event  in  PTC  carcinogenesis  may  be  constitutive  activa¬ 
tion  of  the  mitogen-activated  protein  kinase  (MAPK)  signaling 
pathway,  which  can  be  caused  by  rearrangement  of  either  the  RET 
and/or  NTRIG  gene.  One  reason  for  this  hypothesis  is  that  while  the 
RET-NTRIG  rearrangement  appears  to  be  the  predominant  gene  fusion 
responsible  for  childhood  PTC,  in  adult-onset  populations  activating 
point  mutations  in  the  BRAF  gene  or,  controversially,  the  RAS  gene 
[37-43],  also  lead  to  constitutive  activation  of  the  MAPK  pathway 
without  RET  and/or  NTRIG  genomic  rearrangement  [44]. 

In  addition  to  differences  in  the  age-related  molecular  onset  of  PTC, 
the  proportion  of  cases  with  either  a  RET  or  NTRIG  rearrangement  also 
appears  to  be  based  on  the  geographic  area  of  origin  [45-47],  possibly 
because  thyroid  cancer  is  established  to  be  associated  with  exposure 
to  ionizing  radiation  [37,48].  Indeed,  studies  of  patient  populations 
exposed  to  either  the  Chernobyl  nuclear  power  plant  accident  [49,50] 


or  the  atomic  bombings  [51]  have  demonstrated  that  genomic 
rearrangements  occur  at  a  higher  frequency  than  mutations  following 
extreme  exposure  to  radiation  [37,48],  suggesting  that  under  certain 
biological  conditions  exposure  to  high  dose  radiation  may  actually 
trigger  specific  DNA  breaks  leading  to  intentional  genomic  rearrange¬ 
ment.  In  fact,  the  fusion  proteins  that  characterize  PTC  contain  a 
number  of  different  N-terminal  partners  fused  the  C-terminal  tyrosine 
kinase  domain  of  either  RET  or  NTRIG  [52]  that  may  depend  on  the 
environmental  cues  leading  to  genomic  rearrangement. 

2.2.  CTNNB1-PLAG1 

Within  a  year  of  publication  of  the  RET-NTRIG  genomic  rearrange¬ 
ment  in  PTC,  another  epithelial  translocation  was  reported  in 
pleomorphic  adenoma  (PA)  [53],  a  slow-growing  epithelial  tumor 
that  is  responsible  for  more  than  50%  of  salivary  gland  tumors  [54],  but 
less  than  10%  of  tumors  from  the  head  and  neck  [55].  In  contrast  to 
RET-NTRIG  which  was  discovered  by  a  screening  technique,  rearran¬ 
gements  in  PA  were  first  identified  by  karyotypic  analysis  of  primary 
tumors.  In  fact,  before  any  of  the  breakpoint  genes  were  identified,  PAs 
were  already  divided  into  four  cytogenetic  groups  (reviewed  in  [56]). 
Rearrangements  of  8ql2  account  for  about  40%  of  PAs  with  t(3;8) 
(p21;ql2)  comprising  about  half  of  rearrangements  at  this  locus. 
Translocations  of  12ql4-15  account  for  about  8%  of  PAs  with  t(9;12) 
(pl2-22;ql3-15)  or  an  ins(9;12)(pl2-22;ql3-15)  responsible  for 
these  abnormalities  [57,58].  Tumors  with  non-recurrent  clonal 
changes  comprise  about  20%  of  PAs,  and  tumors  with  apparently 
normal  karyotypes  account  for  the  remaining  cases  [56]. 

Almost  20  years  after  the  initial  karyotyping  studies,  Kas  et  al.  used 
a  comprehensive  breakpoint  mapping  approach,  southern  blot 
analysis  and  5'  rapid  amplification  of  cDNA  ends  (5'  RACE)  to  identify 
the  genes  involved  in  the  most  prevalent  PA  translocation,  t(3;8)(p21 ; 
ql2)  as  j8 -Catenin  {CTNNB1)  and  PLAG1  (pleomorphic  adenoma  gene 
1)  [59].  Specifically,  the  t(3;8)(p21;ql2)  rearrangement  fuses  the  /3- 
Catenin  ( CTNNB1 )  promoter  and  exon  1  to  PLAG1  exon  2,  resulting  in  a 
marked  increase  in  PLAG1  expression  (Fig.  2).  As  such,  because  the 
gene  fusion  results  in  altered  DNA  level  regulation  of  PLAG1  transcript, 
this  gene  fusion  is  characterized  as  type  1.  Interestingly,  the  reciprocal 
translocation  links  the  PLAG1  promoter  and  exon  1  to  /3 -Catenin  exon 
2,  reducing  [3-Catenin  expression.  As  p>-Catenin  signals  through 
several  well-characterized  oncogenic  pathways  (reviewed  in  [60]), 
the  reduction  in  [3-Catenin  is  curious.  PLAG1,  however,  belongs  to  the 
PLAG  family  of  proteins  and  encodes  a  zinc  finger  protein  with  two 
putative  nuclear  localization  signals  and  can  bind  to  either  DNA  or 
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Fig.  2.  Genomic  structure  of  gene  fusions  with  altered  transcriptional  regulation.  The  CTNNB7-PLAG7  and  TMPRSS2-ERG  chimeras  represent  an  important  class  of  gene  fusions  in 
which  the  proto-oncogene  remains  largely  intact,  but  the  genomic  rearrangement  places  a  new  promoter  and  5'-UTR  upstream  of  the  main  coding  sequence,  leading  to  aberrant 
expression  of  the  proto-oncogene. 


RNA.  Forced  expression  of  PLAG1  in  NIH3T3  cells  has  demonstrated 
that  this  protein  can  induce  the  standard  characteristics  of  neoplastic 
transformation  including  loss  cell-cell  contact  inhibition,  anchorage- 
independent  growth,  and  tumor  formation  in  nude  mice  xenografts 
[61].  This  suggests  that  the  constitutive  activity  of  the  CTNNB1 
promoter  leads  to  sufficient  PLAG1  expression  for  malignant  trans¬ 
formation  in  PA. 

2.3.  PRCC-TFE3 

As  cloning  and  molecular  strategies  improved  in  the  early  1999's, 
another  recurrent  gene  fusion  would  soon  be  described  in  papillary 
renal  cell  carcinoma  (PRCC),  the  second  most  common  carcinoma  of 
the  renal  tubules  accounting  for  15-20%  of  all  renal  cell  carcinomas 
[62-66].  Karyotypic  analysis  as  early  as  1986  (de  Jong  et  al.)  led  to  the 
identification  of  abnormalities  in  the  Xpll.2  region  characterized  by  a 
genomic  rearrangement,  t(X;l)(pll.2;q21.2)  [62-66].  Interestingly, 
before  any  of  the  genes  surrounding  the  breakpoint  were  cloned  a 
gene  encoding  TFE3,  which  was  originally  identified  by  their  ability  to 
bind  to  pE3  elements  in  the  immunoglobin  heavy  chain  intronic 
enhancer  [67],  was  mapped  to  the  Xpll.22  locus  [68],  and  later  shown 
to  encode  a  member  of  the  basic  helix-loop-helix  followed  by  a 
leucine  zipper  family  (bFILHzip)  of  transcription  factors.  After  the 
original  genomic  mapping,  TFE3  was  soon  identified  at  the  transloca¬ 
tion  breakpoint  by  southern  blot  analysis  [69].  Subsequent  5'-RACE 
identified  PRCC;  a  ubiquitously  expressed  gene  that  encodes  a  protein 
with  a  high  proportion  of  prolines  and  glycines  —  including  three  P-X- 
X-P  motifs  that  are  known  to  interact  with  SH3  domains  [70,71]. 
Interestingly,  the  fusion  event  leading  to  the  PRCC-TFE3  rearrange¬ 
ment  also  results  in  a  reciprocal  TFE3-PRCC  gene  fusion  [69,72]. 

To  elucidate  the  properties  of  these  reciprocal  gene  fusions, 
Weterman  et  al.  introduced  wild  type  PRCC,  wild  type  TFE3,  PRCC- 
TFE3  and  TFE3-PRCC  expression  vectors  into  COS  cells  and  postulated 
that  only  the  PRCC-TFE3  gene  fusion  was  responsible  for  tumor 
formation  based  on  its  ability  to  activate  a  generalized  report  assay 


[73].  Thus,  the  PRCC-TFE3  genomic  rearrangement  is  type  3  as  the 
fusion  protein  gained  a  novel  function  through  rearrangement. 
However,  fusions  of  the  PSF  or  NonO  pre-mRNA  splicing  factors  are 
also  recurrently  fused  to  TFE3,  albeit  at  a  much  lower  frequency  than 
PRCC  [69,72,74],  suggesting  that  the  TFE3  portion  of  the  fusion  is 
responsible  for  malignant  transformation.  Subsequent  transcriptional 
activation  assays  demonstrated  that  of  the  PSF-TFE3,  NonO-TFE3  and 
PRCC-TFE3  chimeras,  only  the  PRCC-TFE3  fusion  protein  could  activate 
the  plasminogen  activator  inhibitor-1  ( PAI-1 )  promoter  [75],  suggest¬ 
ing  that  only  this  gene  fusion  retains  transcriptional  activity.  However, 
recent  co-immunoprecipitation  experiments  demonstrated  that  anti¬ 
bodies  against  the  pre-mRNA  splicing  factors  SC35,  PRL1,  and  CDC5 
were  able  to  immunoprecipitate  wild  type  PRCC,  and  an  anti-SM 
antibody  was  able  to  immunoprecipitate  the  PRCC-TFE3  fusion 
protein  [75].  This  data  suggests  that  the  fusion  protein  functions 
may  partially  function  through  transcriptional  pathways,  it  may  also 
function  by  altering  pre-mRNA  splicing,  but  more  conclusive  experi¬ 
ments  need  to  be  conducted  to  demonstrate  this  phenotype. 

2.4.  FIMGA2,  evading  let-7 

While  most  of  the  gene  fusions  discovered  until  this  point 
including  PRCC-TFE3  were  thought  to  define  specific  epithelial 
tumor  types,  a  new  gene  fusion  that  was  associated  with  several 
different  tumor  types,  including  pleomorphic  adenoma  (PA)  (see 
above),  lipoma,  uterine  leiomyoma  and  some  myeloid  malignancies 

[76] ,  would  refute  the  notion.  In  fact,  the  discovery  of  translocations 
involving  12ql5  had  been  established  by  karyotypic  analysis  in 
multiple  tumor  types  before  the  rearranged  genes  were  actually 
identified  and  one  of  the  genes  involved  in  the  t(9;12)(pl2-22;ql3- 
15)  PA  translocation  was  first  identified  in  both  mesenchymal  tumors 

[77]  and  lipomas  [78].  This  first  gene  to  be  described  was  the  5'  gene 
fusion  partner,  FIMGA2  (high  mobility  group  AT-hook  2),  belongs  to 
the  non-histone  chromosomal  high  mobility  group  (HMG)  protein 
family,  which  are  small  nuclear  proteins  (<30  kDa)  that  undergo 
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extensive  post-translational  modifications  and  contain  nine  amino 
acid  segments  that  bind  AT-rich  DNA  stretches  in  the  minor  groove 
(AT-hooks)  (reviewed  in  [79]).  Subsequent  3'  RACE  of  tumor  samples 
revealed  that  HMGA2  has  two  different  3'  partners  in  PA,  FHIT  and 
NFIB,  both  of  which  contribute  very  little  coding  sequence  to  the 
resulting  fusion  gene.  In  fact,  in  one  class  of  translocations,  FIMGA2 
exon  3  is  fused  to  FHIT  exon  9  or  10,  resulting  in  retention  of  the  C- 
terminal  26  amino  acids  of  FHIT  [80],  and  in  the  other  set,  HMGA2 
exon  3  or  4  fusion  to  NFIB  exon  9  appends  five  amino  acids  (SWYLG) 
to  the  truncated  HMGA2  protein  [81]. 

Surprisingly,  transgenic  mice  overexpressing  wild  type  HMGA2 
were  observed  to  have  similar  phenotypes  to  mice  expressing  the 
truncated  protein  HMGA2  protein  found  in  the  PA  gene  fusions  [82- 
84].  To  complicate  this  observation,  in  hereditary  renal  cell  carcinoma, 
FHIT  was  previously  demonstrated  to  be  fused  to  the  patched  related 
gene  TRC8  by  t(3;8)(pl4.2;q24.1)  [85,86]  and  the  (SWYLG)  amino 
acid  motif  found  in  the  HMGA2-NFIB  gene  fusion  were  shown  to  be 
essential  for  NFIB  function  [81].  Recent  research,  however,  has  shed 
light  onto  the  importance  of  these  translocations  to  neoplastic 
transformation. 

The  discovery  that  small  RNAs  called  microRNAs  can  negatively 
regulate  gene  expression  through  direct  binding  to  a  gene's  3 '-UTR 
has  led  to  the  hypothesis  that  certain  microRNAs  can  function  as 
tumor  suppressors  in  cancer  [87].  Bioinformatic  analysis  of  the 
HMGA2  3'-UTR  demonstrated  that  the  mRNA  contains  seven 
conserved  sites  complementary  to  the  let-7  microRNA  [88]  (depicted 
in  Fig.  3).  To  show  that  the  let-7  microRNA  negatively  influences 
HMGA2  expression,  Mayr  et  al.  built  a  HMGA2  3'-UTR  conjugated 


luciferase  reporter  and  demonstrated  that  let-7  represses  its  expres¬ 
sion  [89].  As  such,  although  the  genomic  rearrangements  between 
HMGA2  and  FHIT  or  NFIB  yield  fusion  proteins,  replacement  of  a  Let-7 
regulated  3'-UTR  seems  to  be  the  critical  event  because  it  leads  to 
HMGA2  overexpression,  which  is  sufficient  for  neoplastic  transforma¬ 
tion.  Thus,  the  HMGA2  genomic  rearrangement  represent  the  first  of  a 
novel  class  of  gene  fusions,  type  2,  in  which  fusion  gene  activity  is 
enhanced  by  loss  of  mRNA  level  regulation  (Fig.  3). 

2.5.  Pax8-PPARy 

In  2000,  Kroll  et  al.  employed  fluorescence  in  situ  hybridization 
(FISH),  yeast  artificial  chromosome  mapping  and  3'  RACE  to  identify 
genes  involved  in  a  genomic  rearrangement,  t(2;3)(ql3;p25)  [90], 
that  was  originally  identified  by  karyotype  analysis  of  follicular 
thyroid  carcinomas,  a  subset  (10-20%)  of  all  thyroid  malignancies 
[91].  This  translocation  is  thought  to  be  specific  to  FTC  as  it  has  not 
been  reported  in  other  thyroid  tumors  or  hyperplastic  nodules  [92].  In 
the  resulting  gene  fusion,  the  Pax8  (Paired  box  gene  8)  gene  is  fused  to 
PPARy  (Peroxisome  proliferator-activated  receptor-7),  a  ubiquitously 
expressed  transcription  factor  [90].  The  Pax8  protein  is  involved  in 
thyroid  follicular  cell  development  and  regulation  of  thyroid-specific 
gene  expression  [93].  PPARy  plays  a  major  role  in  a  number  of 
different  diseases  including  obesity,  atherosclerosis,  diabetes  as  well 
as  cancer  (reviewed  in  [94]).  Because  Pax8  is  a  thyroid-specific 
transcription  factor  and  because  its  DNA  binding  domain  is  fused 
to  the  c-terminal  domains  of  PPARy  [90],  the  resulting  protein  chimera 
is  thought  to  have  constitutive  re-distribution  of  PPARy-directed 
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Fig.  3.  HMGA2  gene  fusions  elude  the  Let-7  family  of  microRNAs.  The  HMGA2  mRNA  structure  is  shown  along  with  putative  Let-7  family  binding  sequences  in  the  HMGA2  3'-UTR. 
Results  were  predicted  by  TargetScan  [202]  and  three  representative  microRNAs  are  shown  with  there  highest  probability  binding  sites  of  the  seven  total  predicted  sites  along  the  3' 
UTR.  Distance  to  each  predicted  binding  site  is  annotated  as  nucleotides  from  the  start  of  the  3'UTR.  Below  the  wild  type  HMGA2  mRNA  are  the  HMGA2-FHIT  and  HMGA2-NFIB 
mRNAs  that  result  from  these  two  gene  fusions.  TargetScan  did  not  predict  any  microRNA  binding  sites  in  these  genes.  As  such,  the  HMGA2  gene  fusions  represent  a  second  class  of 
gene  fusions  in  which  the  recombination  event  allows  the  proto-oncogene  mRNA  to  evade  microRNA-mediated  silencing. 
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transcription.  In  2005,  gene-expression  microarray  profiling  revealed 
a  distinct  signature  in  follicular  thyroid  carcinomas  harboring  the 
Pax8-PPARy  gene  fusion  in  which  cell  growth  and  chromatin 
remodeling  pathways  were  over-represented  and  protein  biosynth¬ 
esis  pathways  were  under-represented  as  compared  to  follicular 
thyroid  carcinomas  without  the  translocation  [95],  suggesting  that 
PPARy-transcription  is  indeed  redefined  by  the  gene  fusion. 

Interestingly,  follicular  thyroid  carcinomas  were  originally  thought 
to  arise  from  disruption  of  distinct  molecular  pathways,  either 
through  the  fusion  of  Pax8  to  PPARy,  or  through  the  acquisition  of 
point  mutations  leading  to  the  constitutive  activation  of  the  G-protein 
RAS.  In  fact,  one  study  reported  that  16/33  (49%)  of  follicular 
carcinomas  had  RAS  mutations,  12/33  (36%)  had  Pax8-PPARy 
rearrangement,  only  1/33  (3%)  had  both,  and  4/33  (12%)  had  neither 
[96].  However,  in  2006,  quantitative  reverse  transcription  PCR  analysis 
of  follicular  carcinoma  clinical  samples  demonstrated  loss  of  the 
tumor  suppressor  NORE1A  in  samples  harboring  the  Pax8-PPARy 
rearrangement,  but  not  in  other  samples  [97].  Because  NORE1A 
binds  to  the  GTP  bound  (activated)  RAS  protein  and  suppresses  RAS 
activity,  this  discovery  suggested  that  activation  of  the  RAS  pathway  is 
a  critical  event  in  pathogenesis  of  thyroid  carcinoma  that  is  altered 
either  directly  by  activating  mutation,  or  indirectly  by  the  Pax8-PPARy 
rearrangement. 

2.6.  BRD-NUT 

Soon  after  the  discovery  of  the  Pax8-PPARy  rearrangement,  the 
translocation  t(15;19)(ql3;pl3.1)  was  identified  in  a  rare,  highly 
aggressive  carcinoma  arising  in  the  midline  organs  and  upper 
respiratory  tract  of  young  people  now  termed  nuclear  protein  in 
testis  (NUT)  midline  carcinomas  (NMC)  [98-100].  BRD4,  which 
contains  the  chromosome  19  breakpoint,  has  two  annotated  tran¬ 
scripts  encoding  either  short  or  long  forms  of  the  protein  that  both 
contain  N-terminal  bromodomains.  The  longer  BRD4  transcript 
encodes  a  ubiquitously  expressed  200  kDa  nuclear  protein  [101] 
with  a  c-terminal  lysine  rich  region  that  is  not  found  in  the  shorter 
transcript.  The  translocation  resulting  in  fusion  to  the  NUT  gene 
(identified  by  southern  blot  analysis)  only  disrupts  the  longer  BRD4 
transcript  resulting  in  the  loss  of  the  lysine  rich  region  in  the  fusion 


oncogene.  Several  studies  of  BRD4  in  both  murine  and  human  cell  line 
models  have  demonstrated  a  critical  role  in  cell  cycle  progression  and 
cell  proliferation  [102,103].  In  fact,  Brd4  enhances  cell  growth  by 
interacting  with  chromatin  [104],  replication  factor  C  [102]  and 
cyclinTl  and  CDKl  that  constitute  core  positive  transcription  elonga¬ 
tion  factor  b  (P-TEFb)  [105].  Likewise,  chromatin  immunoprecipita- 
tion  assays  demonstrated  that  Brd4  is  required  to  recruit  P-TEFb  to 
active  promoters,  and  that  increased  Brd4  leads  to  increased  P-TEFb- 
dependent  phosphorylation  of  RNA  polymerase  and  enhanced 
transcription  from  promoters  in  vivo  [105]. 

More  insight  into  the  role  of  the  BRD4-NUT  fusion  protein  in  NMC 
biology  came  from  a  screen  for  other  NMC  gene  fusions.  Because  the 
BRD4-NUT  translocation  defines  two-thirds  of  all  NMCs,  French  et  al. 
used  a  candidate  gene  approach  to  screen  other  NMC  samples  and 
discovered  another  recurrent  translocation  between  BRD3  and  NUT 
that  defined  large  portion  of  the  remaining  NMC  cases  [106].  The 
BRD3-NUT  fusion  gene  encodes  a  protein  highly  similar  to  that 
encoded  by  the  BRD4-NUT  transcript.  It  is  composed  of  two  tandem 
chromatin-binding  bromodomains,  an  extra-terminal  domain,  a 
bipartite  nuclear  localization  sequence,  and  a  significant  portion  of 
NUT  coding  sequence.  As  such,  the  conserved  protein  structure  gave 
insight  into  the  mechanism  by  which  the  chimeric  protein  induces 
neoplastic  properties. 

Wild  type  NUT,  which  is  normally  only  expressed  in  the  testis  [99], 
contains  both  nuclear  localization  and  export  signal  sequences  and  is 
shuttled  between  the  nucleus  and  cytoplasm  via  a  leptomycin- 
sensitive  pathway  [106].  Importantly,  however,  the  Brd3-NUT  and 
Brd4-NUT  proteins  are  retained  in  the  nucleus,  suggesting  that 
interactions  between  the  Brd3  or  Brd4  bromodomains  and  chromatin 
are  essential  to  the  fusion  protein  [106]  (Fig.  4).  Further  evidence  for 
this  hypothesis  comes  from  an  siRNA  experiment  in  which  knock¬ 
down  of  Brd-NUT  fusion  transcripts  in  NMC  cell  lines  resulted  in 
squamous  differentiation  and  cell  cycle  arrest  [106].  This  suggested 
that  the  nuclear  retention  of  NUT,  not  the  loss  of  the  Brd  C-terminal 
domain,  is  responsible  for  promoting  NMC  carcinogenesis  [106].  The 
realization  that  Brd-NUT  gene  fusions  define  a  class  of  translocations 
that  fuse  bromodomains  to  the  NUT  protein  suggests  that  oncogenic 
translocations  will  arise  from  multiple  partners  when  critical  domains 
are  present  in  more  than  one  gene. 


BRD4 


Fig.  4.  Nuclear  retention  of  NUT.  The  BRD4-NUT  gene  fusion  represents  a  third  class  of  rearrangements  in  which  the  resulting  protein  gains  activity  to  become  a  proto-oncogene.  In 
this  case,  the  two  bromodomains  of  BRD4  are  fused  to  NUT.  Although  NUT  usually  cycles  between  the  nucleus  and  cytoplasm  in  a  highly  controlled  manner,  appendage  of  the  BRD4 
bromodomains  to  the  majority  of  the  NUT  protein  lead  to  nuclear  retention  of  the  protein  and  aberrant  activity. 


J.C.  Brenner,  A.M.  Chinnaiyan  /  Biochimica  et  Biophysica  Acta  1796  (2009)  201-215 


207 


2.7.  ETV6-NTRK3 

The  first  major  example  of  a  recurrent  epithelial  rearrangement 
that  appeared  not  only  in  multiple  tumor  types,  but  had  also  been 
reported  in  a  large  subset  of  hematologic  malignancies  was  detected  in 
several  cases  of  secretory  breast  carcinoma,  a  rare  subtype  of 
infiltrating  ductal  carcinoma  affecting  both  children  and  adults  [107]. 
Tognon  et  al.  detected  the  ETV6-NTRK3  fusion  by  comprehensive  FISH 
analysis  in  92%  (12  of  13)  secretory  breast  carcinoma  cases  [108].  ETV6 
(also  TEL )  is  an  ETS  family  member  that  is  involved  in  a  large  number 
of  fusions  to  either  a  transcription  factor  like  AML  \  [109]  or  to  a  protein 
tyrosine  kinase  domain  like  that  o fABL  [110,111  ],JAI<2  [112-114],  ARG 
[115,116],  PDGER/3  [117]  or  FGER3  [118],  each  of  which  define  a  unique 
leukemia  subtype  (reviewed  in  [119]).  ETV6  contains  a  pointed 
oligomerization  domain  (PNT;  also  known  as  sterile  alpha  motif, 
SAM,  or  helix-loop-helix,  HLH)  and  an  ETS  DNA  binding  domain,  the 
expression  of  which  is  required  for  developmental  processes  such  as 
hematopoiesis  and  yolk  sac  angiogenesis  [120].  NTRK3  is  a  transmem¬ 
brane  neurotrophin-3  surface  receptor  that  contains  a  c-terminal 
protein  tyrosine  kinase  domain  and  plays  a  role  in  growth,  develop¬ 
ment,  and  cell  survival  of  neural  cells  in  the  central  nervous  system 
(reviewed  in  [121]).  The  fusion  of  the  N-terminal  ETV6  pointed 
domain  to  the  C-terminal  tyrosine  kinase  domain  of  NTRK3  was  first 
reported  in  congenital  fibrosarcoma  (CFS)  [122],  but  has  since  been 
reported  in  multiple  cell  lineages  including  those  that  give  rise  to 
congenital  mesoblastic  nephroma  (CMN),  acute  myelogenous  leuke¬ 
mia,  and  secretory  breast  carcinoma  [108]  (reviewed  in  [123]). 

Following  the  initial  discovery,  research  focused  on  the  transforming 
ability  of  the  recombination  product.  By  using  retroviral  gene  delivery 
methods,  the  ETV6-NTRK3  fusion  gene  was  shown  to  be  sufficient  to 
induce  the  non-tumorigenic  murine  breast  cell  lines  Eph4  (epithelial) 
and  Scg6  (myoepithelial)  as  well  as  NIH-3T3  fibroblasts  to  form  tumors, 
glandular  structures  and  to  express  epithelial  antigens  [108].  This 
discovery  suggested  that  the  fusion  gene  acts  as  a  dominant  oncogene  in 
secretory  breast  cancer.  ETV6-NTRK3  was  also  shown  to  inhibit  TGF-p> 
tumor  suppressor  activity  in  NIH3T3  cells  [124],  suggesting  that  it  most 
likely  regulates  microRNA  biogenesis  indirectly  [125],  but  this  has  not 
yet  been  explored.  Although  it  is  known  that  adults  have  a  less  favorable 
prognosis  than  children  and  distant  metastases  are  rare  [126],  local 
recurrences  and  nodal  metastases  have  been  observed  [127]  suggesting 
that  the  gene  fusion  leads  to  an  invasion-associated  transcriptional 
program,  but  this  also  has  not  been  explored.  Despite  this,  it  is  known 
that  constitutive  activation  of  the  fusion  protein  leads  to  activation  of 
the  Ras-mitogen-activated  protein  kinase  (MAPK)  pathway  and  the 
phosphoinositide-3-kinase  (PI3K)-AKT  pathway,  the  mechanism  lead¬ 
ing  to  activation  of  these  pathways  has  remained  elusive  until  recently, 
when  the  fusion  protein  was  shown  to  associate  with  c-Src  by 
immunoprecipitation  from  fusion-positive  CFS  and  CMN  human 
primary  tumors  [128].  More  recently,  however,  a  mouse  knockin 
model  was  created  by  introducing  the  human  NTRI<3  cDNA  into  exon  6 
of  the  mouse  ETV6  locus,  which  induced  a  fully  penetrant,  multifocal 
breast  cancer  [129].  By  using  microarray  analysis  of  unsorted  and  sorted 
tumors  from  this  model,  as  well  as  NIH3T3  cells  transduced  with  the 
fusion  gene,  the  authors  showed  that  ETV6-NTRK3  enriches  for  WNT 
target  genes  through  activation  of  the  API  complex  [129].  The 
requirement  for  API  activity  in  F7V6-NTRK3-mediated  transformation 
was  confirmed  by  showing  that  the  co-expression  of  a  dominant 
negative  component  of  API  complex,  c-JUN  TAM67,  with  the  gene 
fusion  blocked  tumorigenic  properties  both  in  vitro  and  in  vivo  [129]. 
The  ETV6-NTRK3  gene  fusion  represents  one  of  the  last  gene  fusions  to 
be  discovered  by  traditional  biological  techniques. 

2.8.  TMPRSS2-ETS 

In  2005,  advances  in  bioinformatics  led  to  the  discovery  of 
rearrangements  on  chromosome  21  between  TMPRSS2  (transmem¬ 


brane  protease,  serine  2)  and  ERG  (v-ets  erythroblastosis  virus  E26 
oncogene  homolog  (avian))  resulting  in  the  TMPRSS2-ERG  gene 
fusion.  Thus  far,  genomic  rearrangements  leading  to  an  ERG  gene 
fusion  have  been  reported  in  approximately  50%  of  clinically  localized 
prostate  cancers  published  (reviewed  in  [130]).  TMPRSS2  is  a  prostate- 
specific,  androgen-regulated  gene  [131-133]  that  has  two  annotated 
transcription  variants,  both  of  which  are  involved  in  the  fusion  with 
ERG ;  the  annotated  TMPRSS2  in  about  50%  of  the  gene  fusions,  an 
alternative  TMPRSS2  variant  in  10%  of  gene  fusions,  and  both  variants 
in  slightly  more  than  40%  of  analyzed  gene  fusions  [134].  ERG  belongs 
to  the  ETS  family  of  transcription  factors  and  has  two  transcription 
variants  that  differ  only  slightly  in  the  S'-UTR  (deleted  in  the  gene 
fusion)  and  in  the  usage  of  an  in-frame  exon,  the  role  of  which  remains 
undefined.  The  most  common  TMPRSS2-ERG  gene  fusion  variants 
involve  TMPRSS2  exon  1  or  2  fused  to  ERG  exon  2,  3,  4,  or  5  [134-143] 
and  less  frequently  rearrangements  of  TMPRSS2  exon  4  or  5  fused  to 
ERG  exon  4  or  5  [141].  In  line  with  the  combinatorial  complexity  of 
TMPRSS2-ERG  rearrangements,  different  fusions  have  correlated 
with  slightly  different  phenotypic  outcomes.  For  example,  TMPRSS2 
exon  2  fused  with  ERG  exon  4  is  associated  with  aggressive  disease, 
while  others  have  been  associated  with  seminal  vesicle  invasion  and 
poor  outcome  [143]. 

Like  TMPRSS2 ,  the  TMPRSS2-ERG  gene  fusion  is  androgen-regulated 
in  an  androgen-responsive  cell  line  (VCAP)  carrying  the  rearrange¬ 
ment  [135],  but  not  in  an  androgen-insensitive  cell  line  harboring  the 
fusion  (NCI-H660)  [144].  We  have  shown  that  VCaP  cells  and  benign 
prostate  cells  forced  to  overexpress  ERG  drive  components  of  the 
plasminogen  activation  pathway  to  mediate  cellular  invasion  using 
transwell  migration  assays  [145].  We  have  also  reported  that  primary 
or  immortalized  benign  prostate  epithelial  cells  overexpressing  ERG 
have  a  transcriptional  program  with  high  levels  of  several  invasion- 
associated  genes,  but  did  not  display  phenotypic  increases  in  cellular 
proliferation  or  anchorage-independent  growth  [145].  Despite  this, 
one  group  recently  identified  c-MYC  as  a  downstream  target  of  ERG 
and  demonstrated  that  ERG  knockdown  in  TMPRSS2-ERG  expressing 
CaP  cells  resulted  in  loss  of  cell  growth  in  vitro  and  loss  of 
tumorgenicity  in  vivo ,  with  only  22%  (2/9)  mice  developing  detectable 
tumors  at  day  42  in  siRNA  treated  cells  as  compared  to  100%  (5/5)  in 
the  control  group  [146].  Interestingly,  transgenic  mice  expressing  an 
androgen-regulated  ERG  fusion  gene  develop  mouse  prostatic  intrae¬ 
pithelial  neoplasia  (PIN),  a  precursor  lesion  of  prostate  cancer,  not 
prostate  cancer.  Taken  together  with  our  in  vitro  data,  these  results 
suggest  that,  without  secondary  molecular  lesions  such  as  loss  of  the 
tumor  suppressors  PTEN  or  NI<X3-1,  the  TMPRSS2-ERG  gene  fusion 
may  not  be  sufficient  for  transformation  [145,147,148]. 

Although  ERG  clearly  participates  in  the  majority  of  ETS  family 
gene  fusions  in  prostate  cancer,  other  ETS  family  members  including 
ETV1  [135],  ETV4  [149,150]  and  ETV5  [151]  also  contribute  to  gene 
fusions  in  prostate  cancer,  albeit  at  a  much  lower  frequency.  In 
contrast  to  TMPRSS2,  which  is  the  only  known  5'  partner  to  ERG,  the 
other  ETS  family  members  may  have  a  variety  of  5'  partners  including 
those  with  androgen-responsive  promoters  ( TMPRSS2 ,  SLC45A3,  KLK2 , 
HERV-K_22q  \  1.23  and  CANT1),  one  with  an  androgen-insensitive 
promoter,  but  a  constitutively  active  promoter  ( HNRPA2B1 ),  and  one 
with  an  androgen-repressed  promoter  ( C15orf21 )  [135,149,151-153]. 
As  in  the  case  of  ERG,  forced  expression  of  ETV1  under  the  control  of  a 
CMV  promoter  did  not  enhance  cell  proliferation  in  benign  prostate 
epithelial  cell  lines  and  did  not  lead  to  anchorage-independent  colony 
formation  in  soft  agar,  but  did  lead  to  the  enrichment  of  genes 
associated  with  invasion  [145].  Consequently,  knockdown  of  ETV1  in 
LNCAP  cells  prevented  transwell  invasion  through  matrigel  [145,154]. 

2.9.  EML4-ALK 

Recently,  Soda  et  al.  reported  a  retroviral-mediated  transformation 
screen,  in  which  they  created  a  cDNA  expression  library  from  a 
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surgically  resected  lung  adenocarcinoma  [155].  Following  transforma¬ 
tion  of  NIH3T3  cells,  cDNAs  were  recovered  from  cells  by  PCR 
amplification  and  sequenced.  One  of  these  sequenced  transcripts 
contained  a  fusion  between  EML4  (echinoderm  microtubule-asso¬ 
ciated  protein-like  4)  and  ALK  (anaplastic  lymphoma  kinase)  that  was 
later  confirmed  as  an  inversion  of  chromosome  2p  in  6.7%  (5  of  75) 
NSCLC  patients  [155].  Wild  type  EML4  is  a  member  of  the  EMAP  family 
of  proteins  and  the  amino-terminus  (amino  acids  1-249)  were 
previously  demonstrated  to  be  essential  for  microtubule  formation 
in  HeLa  cells  [156].  ALK  encodes  a  tyrosine  kinase  and  a  MAM  domain 
(a  domain  frequently  found  on  the  extracellular  side  of  the  membrane 
on  many  receptors).  Despite  the  apparent  low  frequency  EML4-ALK 
gene  fusions  in  NSCLC,  the  transforming  ability  of  EML4-ALK  gene 
fusion  variant  1,  2,  and  3b,  but  not  a  kinase  inactive  mutant  (K589M) 
has  been  demonstrated  by  engrafting  NIH-3T3  cells  infected  with 
retroviral  expression  vectors  and  showing  that  tumors  arise  in  8/8 
mice  from  all  groups  except  for  the  kinase  dead  mutant  [157]. 

To  corroborate  the  low  frequency  EML4-ALK  rearrangements  in 
NSCLC,  careful  PCR-based  analysis  was  completed  on  NSCLC  cases  to 
identify  novel  in-frame  EML4-ALK  gene  fusions  that  led  to  the 
identification  of  two  novel  fusion  isoforms  called  variant  3a  and  3b 
[157].  Even  more  recently,  analysis  of  a  cohort  of  253  lung 
adenocarcinoma  patient  samples  identified  two  new  EML4-ALK 
fusions  in  which  either  exon  14  or  exon  2  of  EML4  was  fused  to  Exon 
20  of  ALK  (variants  4  and  5,  respectively),  however,  only  4.35%  of 
patients  were  found  to  express  any  of  the  5  known  EML4-ALK 
genomic  rearrangements  [158].  A  similarly  low  rate  of  the  ELM4-ALK 
fusion  was  reported  in  a  study  of  104  lung  cancer  surgical  specimens 
with  only  one  fusion-positive  case  [159]  and,  in  a  study  of  different 
lung  cancers,  the  fusion  was  identified  in  3.4%  (5  of  149)  adenocarci¬ 
nomas,  but  not  in  48  squamous  cell  carcinomas,  3  large-cell 
neuroendocrine  carcinomas,  or  21  small-cell  carcinomas  [160]. 
However,  this  is  to  be  expected,  given  the  small  sample  size  from 
non-adenocarcinomas.  The  ALK  gene  has  previously  been  identified  as 
the  3'  fusion  partner  of  N PM-  [161],  TPM3-  [162],  CLTC-  [163],  ATIC- 
[164-166]  and  TFG-  [167].  In  light  of  this  observation,  RT-PCR  analysis 
was  used  to  screen  all  known  hematologic  ALK  fusion  partners  in  a 
cohort  of  77  NSCLC  samples,  however,  no  redundant  fusion  partners 
were  identified  and  only  2.6%  (2  of  77)  of  NSCLC  cases  harbored  the 
EML4-ALK  fusion  [168].  To  supplement  the  existing  RT-PCR  data  in  the 
literature,  our  group  developed  a  break-apart  FISH  assay  to  analyze 
ELM4-ALK  fusion  as  well  as  the  amplification  of  each  gene.  We 
reported  the  fusion  occurred  in  less  than  3%  of  NSCLC  cases  analyzed, 
and  that,  in  most  cases  harboring  the  lesion,  not  all  cells  exhibited  the 
fusion.  We  also  found  that  EML4  and/or  ALK  amplification  occurred, 
indicating  that  other  mechanisms  of  genomic  rearrangement  leading 
to  amplification  may  arise  [169]. 

2.10.  SLC34A2-ROS 

In  2007,  a  survey  of  phosphotyrosine  signaling  in  lung  cancer  not 
only  led  to  the  re-identification  of  the  EML4-ALK  fusion,  but  also  the 
discovery  of  a  novel  translocation  between  chromosomes  4pl5  and 
6q22,  in  which  the  transmembrane  domain  containing  N-terminal 
region  of  the  solute  carrier  family  32,  member  2  ( SLC34A2 )  is  fused  to 
an  N-terminal  transmembrane  domain  of  the  c-ros  oncogenes  1 
(ROS),  respectively,  in  the  lung  cell  line  HCC78  [170].  SLC34A2  is 
encoded  from  a  single  transcription  variant  and  ROS,  which  is  a  type  I 
integral  membrane-bound  tyrosine  kinase  and  is  a  known  oncogene 
that  is  highly  expressed  in  several  tumor  cell  lines,  and  also  encoded 
from  a  single  transcript.  Interestingly,  while  the  authors  did  not 
identify  SLC34A2  rearrangements  with  ROS  in  patient  samples,  a  gene 
fusion  between  CD74,  located  at  5q32,  and  ROS  was  observed,  in 
which  the  tandem  transmembrane  domain  structure  was  again 
observed  [170].  This  suggests  not  only  that  ROS  is  another  promiscu¬ 
ous  gene  fusion  partner,  but  the  tandem  transmembrane  structure  is 


one  mechanism  leading  to  constitutive  activation  of  the  tyrosine 
kinase.  Indeed,  forced  expression  of  the  SLC34A2-ROS  chimera 
demonstrated  constitutive  kinase  activity  in  the  cellular  membrane 
fraction  [170]. 

2.11.  SLC45A3-ELK4 

With  the  recent  advent  of  next  generation  sequencing  technology 
(described  below),  our  group  has  recently  identified  another 
recurrent  gene  fusion  in  prostate  cancer  [171].  Using  this  technology 
we  identified  the  fusion  of  SLC45A3  to  ELI<4,  an  ETS  family  member. 
Here  exon  4  of  SLC45A3  is  fused  to  exon  1  of  ELK4.  Interestingly,  this 
novel  gene  fusion  was  identified  from  the  RNA  of  a  cell  line  harboring  a 
known  gene  fusion  involving  another  ETS  family  member  gene,  ETV1. 
Likewise  this  novel  gene  fusion  involves  SLC45A3,  which  is  known  to 
fuse  with  ETV1  in  other  prostate  cancer  cases.  Unlike  other  gene 
fusions  described  to  this  point,  SLC45A3-ELK4  seems  to  result  from 
polymerase  read-through  and  intergenic  splicing  rather  than  genomic 
rearrangement  as  no  detectable  alterations  were  detected  on  the  DNA 
level  by  fluorescence  in  situ  hybridization  (FISH),  array  comparative 
hybridization  (aCGH)  or  high-density  single  nucleotide  polymorph¬ 
ism  (SNP)  arrays  [171].  RNA  level  gene  fusions  were  recently 
identified  in  endometrial  stromal  tumors  and  are  discussed  below. 

3.  Lessons  from  MLL  translocations 

While  the  list  of  epithelial  derived  gene  fusions  continues  to 
expand,  it  is  important  to  highlight  unique  mechanisms  of  oncogene 
formation  through  specific  genomic  rearrangements  from  the  hema¬ 
tological  malignancies.  Translocations  altering  the  mixed-lineage 
leukemia  (MLL)  gene  on  llq23  frequently  lead  to  fusions  with  over 
40  different  genes  on  different  chromosomes  with  MLL-AF4  and  MLL- 
AF9  among  the  most  frequent  chimeras  (reviewed  in  [172,173]). 
Interestingly,  different  MLL  fusions  are  highly  associated  with  either 
acute  myeloid  leukemia  (AML)  or  acute  lymphoid  leukemia  (ALL, 
depending  on  the  fusion  partner  [174].  MLL  is  the  mammalian 
homologue  of  a  Drosophila  gene  called  trithorax ,  which  was  shown 
to  play  a  critical  role  in  axial  morphogenesis  and  patterning  during 
embryogenesis  through  the  regulation  of  HOX  genes  ( HOM-C  in  Dro¬ 
sophila )  [175,176].  Multiple  studies  have  suggested  that  deregulation 
of  HOX  gene  expression  contributes  to  leukemogenesis  [177]. 
Additionally,  retroviral  transduction  of  a  MLL  fusion  gene  construct 
was  able  to  transform  wild  type,  but  not  the  Hoxa9-deficient,  bone 
marrow  cells  providing  direct  evidence  that  specific  HOX  gene 
expression  may  be  required  for  leukemogenesis  [178].  Because  MLL 
chimeras  often  lose  large  fragments  and  different  domains  from  either 
the  N-  or  C-terminal  regions,  the  seemingly  critical  role  of  MLL- 
associated  HOX  gene  expression  to  leukemogenesis  led  to  the  question 
of  whether  the  molecular  mechanisms  by  which  wild  type  MLL 
regulates  gene  expression  are  mutually  exclusive  from  those 
employed  by  MLL  chimeras  [179]. 

As  the  molecular  mechanisms  of  MLL  target  gene  regulation 
continue  to  unravel,  several  studies  have  shed  light  on  the  fact  that 
molecular  function  between  wild  type  and  fusion  gene  settings  may 
be  unique,  though  the  outcome  of  gene  activity  is  ultimately  similar. 
Wild  type  MLL  encodes  a  multi-domain  protein  with  three  AT-hooks 
used  for  binding  AT-rich  DNA  sequences  and  a  histone  methyltrans- 
ferase  domain  [180]  and  assembles  into  supercomplexes  containing 
several  different  chromatin  remodeling  enzymes  on  target  DNA  motifs 
like  those  found  in  HOX  genes  [181].  Chimeric  MLL  proteins,  on  the 
other  hand,  appear  to  utilize  different  mechanisms  to  modulate  HOX 
gene  expression  and  initiate  leukemogenesis.  For  example,  fusion  of 
coiled-coil  domains  from  GAS7  or  AFlp  to  MLL  endow  the  chimeric 
protein  with  the  ability  to  dimerize  on  the  target  gene  promoters  and 
have  been  suggested  to  stimulate  transcription  through  the  inap¬ 
propriate  recruitment  of  members  of  the  MLL  supercomplex  [182]. 
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This  suggested  that  preventing  dimerization  of  the  coiled-coil 
domains  with  targeted  small  molecules  could  inhibit  MLL  activity  in 
this  subset  of  MLL  fusions.  In  contrast,  some  MLL  fusions  lead  to 
constitutive  nuclear  retention  while  maintaining  similar  binding 
patterns  as  the  dimerizable  MLL  chimeras  on  the  HoxA9  locus  [183]. 
In  the  absence  of  a  partner  gene,  MLL  can  acquire  an  in-frame  partial 
tandem  duplication  (PTD)  of  exons  5  through  11  (occurring  in 
approximately  4%-7%  of  AML  cases)  that  causes  overexpression  of 
HoxA7,  HoxA9,  and  HoxAlO  in  spleen,  BM,  and  blood  in  a  knockin 
mouse  model  [184].  As  such,  altering  downstream  HOX  gene 
expression  appears  to  be  one  critical  role  of  MLL  gene  fusions  and 
rearrangements. 

Given  that  wild  type  and  chimeric  MLL  proteins  appear  to 
accomplish  at  least  one  similar  molecular  function  ( HOX  gene 
regulation),  the  question  of  how  epithelial  gene  fusions  will  function 
in  comparison  to  their  wild  type  counterparts  remains  intriguing.  For 
example,  we  have  very  little  understanding  of  the  normal  molecular 
mechanisms  utilized  by  ERG  and  ETV1  to  control  gene  expression 
(prostate  cancer  gene  fusions,  discussed  above),  let  alone  the  critical 
co-factors  required  for  transcriptional  regulation.  Although  we  may 
expect  the  molecular  mechanisms  of  ERG  and  ETV1  mediated  gene 
regulation  to  be  the  same  in  the  wild  type  and  fusion  settings  (because 
the  encoded  proteins  are  nearly  identical),  this  remains  to  be  proven. 
Perhaps  the  ability  to  design  rational  drug  targets  against  specific 
fusion  proteins  without  obvious  molecular  susceptibilities  (like  the 
tyrosine  kinase  activity  of  BCR-ABL)  will  depend  as  much  on  our 
understanding  of  each  fusion  protein's  function  and  critical  co-factors 
as  on  their  downstream  targets. 

4.  Difficulty  in  identifying  epithelial  cancer  gene  fusions 

With  the  discovery  of  the  TMPRSS2-ERG  gene  fusion  in  prostate 
cancer,  we  look  back  on  the  history  of  cancer  biology  and  wonder  why 
gene  fusions  have  not  been  identified  in  some  of  the  most  well- 
studied  epithelial  cancers?  Part  of  the  problem  was  methodological,  as 
the  chromosome  quality  in  epithelial  neoplasms  is  very  poor  when 
compared  to  hematologic  neoplasms.  However,  cytogenetic  techni¬ 
ques  have  improved  dramatically  since  the  discovery  of  the  “minute” 
chromosome  in  1960  [6].  In  fact,  in  the  1960s,  chromosome  patterns  in 
epithelial  tumors  were  already  being  described  as  abnormal  [185]  and 
it  was  often  thought  that  the  degree  of  cytogenetic  changes 
corresponded  proportionally  with  clinical  progression  [186],  making 
the  identification  of  individual  and  recurrent  translocations  difficult. 
In  fact,  the  idea  that  the  induction  of  genomic  instability  is  a  critical 
and  intended  step  in  the  malignant  progression  of  solid  tumors  has 
gained  considerable  momentum  [187,188].  Recently,  it  was  demon¬ 
strated  that  overexpression  of  Separase,  a  protein  that  is  over¬ 
expressed  in  a  subset  of  breast  cancers,  leads  to  can  induce 
chromosome  instability  and  aneuploidy  in  the  mutant  p53  mouse 
mammary  epithelial  cell  line  FSK3  [189].  Likewise,  deregulation  of 
Mad2,  which  regulates  separase  activity,  has  been  shown  to  promote 
chromosomal  instability,  induce  aneuploidy  and  lead  to  tumorigenesis 
[190].  Interestingly,  once  Mad2-induced  neoplastic  transformation 
has  occurred,  Sotillo  et.  al.  demonstrated  that  expression  of  Mad2  is  no 
longer  required  for  tumor  progression  suggesting  that  the  induction  of 
chromosomal  instability  could  be  a  transient  event  in  oncogenesis 
[190].  In  fact,  it  is  possible  that  specific  gene  fusions  induce  genomic 
instability  through  deregulation  of  normal  mitotic  events  like 
separase  or  Mad2  activity  or  through  novel  mechanisms  yet  to  be 
described.  If  induction  of  chromosomal  instability  was  a  mechanism  of 
oncogenesis  employed  by  a  specific  gene  fusion,  then  induction  of 
other  secondary  “carrier”  chromosomal  rearrangements  would  simply 
serve  to  mask  the  identification  of  the  recurrent  genetic  rearrange¬ 
ment.  Such  a  progression  pattern  in  epithelial  tumors  could  explain 
the  complex  heterogeneity  often  observed  in  such  malignancies 
(Fig.  5).  In  contrast,  leukemias,  lymphomas  and  mesenchymal  tumors 
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Fig.  5.  Difficulty  in  discovering  gene  fusions.  One  possibility  is  that  a  critical  function  of 
oncogenes  in  epithelial  cancers  is  to  alter  genomic  structure  and  it  has  been  suggested 
that  such  changes  could  lead  to  cancer  progression.  However,  if  such  a  model  were  true, 
it  would  give  a  reason  for  the  genomic  heterogeneity  observed  in  epithelial  cancers  that 
has  allowed  recurrent  gene  fusions  to  go  unnoticed  in  solid  tumors. 


are  almost  95%  clonal  [  191  ].  As  such,  the  complexity  and  shear  number 
of  genomic  rearrangements  in  epithelial  malignancies  has  led  to 
difficulty  in  defining  primary  aberrations  in  these  neoplasms.  This 
difficulty  eventually  led  to  the  incorrect  notion  that  genomic 
rearrangements  leading  to  gene  fusions  were  simply  less  common 
in  epithelial  tumors. 

5.  Mitelman  hypothesis 

In  order  to  address  this  notion  that  fusion  genes  are  almost 
exclusively  a  hematologic  phenomena,  Mitelman  et  al.  completed  a 
comprehensive  study  of  all  known  cytogenetically  abnormal  neo¬ 
plasms  reported  in  the  literature  [192].  Importantly,  data  published 
by  the  group  supported  the  counter-hypothesis  that,  in  every  tumor 
type,  the  numbers  of  recurrent  balanced  chromosome  abnormalities, 
gene  fusions  and  balanced  rearrangements  are  a  function  of  the  total 
number  of  analyzed  cases  [192].  In  this  study,  271  gene  fusions  and 
59  potential  gene  fusions  (only  one  gene  identified  at  the  break¬ 
point)  were  catalogued,  of  which  275  unique  genes  were  involved  in 
the  rearrangements  [192].  This  indicated  that  a  substantial  number 
of  genes  were  present  in  more  than  one  chimeric  transcript  (e.g., 
MLL,  ETV6  and  RET  as  described  above).  In  classifying  each  gene 
fusion  by  the  class  to  which  each  member  of  the  chimera  belonged, 
the  group  demonstrated  that  the  proportion  of  fusions  belonging  to 
each  class  was  approximately  equal  in  both  hematologic  and  solid 
tumor  malignancies,  with  the  transcription  factor  class  accounting 
for  38-44%  and  tyrosine  kinase  class  tabulating  5-7%  [192].  This 
study  suggested  that  the  occurrence  of  gene  fusions  is  a  general 
molecular  event  that  has  no  fundamental  tissue-specific  differences. 
However,  gene  rearrangements  must  at  least  encourage  function  in 
specific  genetic  backgrounds  such  as  the  TMPRSS2-ERG  fusion, 
which  requires  active  androgen  signaling,  and  thus  encourages 
prostate  specificity. 

6.  Tissue-specific  gene  fusions 

The  idea  that  genomic  rearrangements  are  tissue-specific  is  an 
emerging  concept  in  the  field  of  gene  fusion  biology.  For  example, 
TMPRSS2  is  a  strongly  androgen-regulated  and  prostate-specific  gene 
that  is  fused  to  the  ETS  family  members  ERG  and  ETV1  in  prostate 
cancer  [135].  While  other  ETS  family  members  form  fusion  genes 
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Table  1 

Chromosomal  rearrangements  in  epithelial  cancers. 


Malignancy 

Gene  fusion 

Chromosome  rearrangement 

Method  of  discovery 

Study 

Ref. 

Follicular  thyroid  carcinoma 

PAX8-PPAR7 

t(2;3)(ql3;p25) 

Primary  tumor  karyotypic  analysis/FISH/3'  RACE 

Kroll  et  al. 

[90] 

Midline  carcinoma 

BRD3-NUT 

t(9;  15)  (q34;ql4) 

Candidate  gene  FISH  Screen 

French  et  al. 

[106] 

BRD4-NUT 

t(15;19)(ql4;pl3) 

Primary  tumor  karyotypic  analysis/ FISH/ southern  blot 

French  et  al. 

[98] 

Non-small-cell  lung  cancer 

EML4-ALK 

inv(2p) 

Transformation  assay/direct  sequencing 

Soda  et  al. 

[155] 

TFG-ALK 

t(2;3)(p23;ql2) 

Tyrosine  Kinase  Activity  Screen/ 5'  RACE 

Rikova  et  al. 

[170] 

SLC34A2-ROS 

t(4;6)  (pl5;q22) 

Papillary  renal  cell  carcinoma 

PRCC-TFE3 

t(X;  1 )  (pll  ;q23) 

Primary  tumor  karyotypic  analysis/ southern  blot/ 5'  RACE 

Sidhar  et  al. 

[69] 

Papillary  thyroid  carcinoma 

RET-NTRK1 

t(  1 ;  10)  (q21  ;qll ) 

Transformation  assay/ direct  sequencing 

Martin-Zanca  et  al. 

[24] 

Pleomorphic  adenoma 

CTTNB1-PLAG1 

t(3;8)(p21;ql2) 

Primary  tumor  karyotypic  analysis/ 

Breakpoint  mapping/ southern  blot/5'  RACE 

Kas  et  al. 

[59] 

HMGA2-FHIT 

t(3 ;  12 )  (P14;ql5) 

Primary  tumor  karyotypic  analysis/3'  RACE 

Geurts  et  al. 

[80] 

HMGA2-NFIB 

t(9;  12)  (q24;ql5) 

Primary  tumor  karyotypic  analysis/3'  RACE 

Geurts  et  al. 

[81] 

Prostate  cancer 

TMPRSS2-ERG 

del(21)(q22) 

COPA/Exon  walking/5'  RACE 

Tomlins  et  al. 

[135] 

TMPRSS2-ETV1 

t(7;21)(p21;q22) 

TMPRSS2-ETV4 

t(17;21)(q21;q22) 

Tomlins  et  al. 

[149] 

TMPRSS2-ETV5 

t(3;21)(p28;q22) 

Helgeson  et  al. 

[151] 

SLC45A3-ELK4 

del(l)(q32) 

Integrated  high-throughput  sequencing 

Maher  et  al. 

[171] 

DDX5-ETV4 

t(17)(q24;q21) 

Candidate  gene  FISH  Screen/5'  RACE 

Han  et  al. 

[150] 

Secretory  breast  carcinoma 

ETV6-NTRK3 

t(12;15)(q!3;q25) 

Primary  tumor  karyotypic  analysis/ FISH 

Tognon  et  al. 

[108] 

that  give  rise  to  other  malignancies,  chimeras  between  androgen- 
regulated  genes  and  ETS  genes  have  only  been  observed  in  prostate 
cancer  [130].  Likewise,  the  ALK  tyrosine  kinase  is  frequently  fused  to 
multiple  partners  in  hematopoietic  (myelogenous  leukemia), 
mesenchymal  (congenital  fibrosarcoma)  and  epithelial  (secretory 
breast  carcinoma)  malignancies,  but  no  redundant  fusion  partners 
have  been  identified  across  tissue  types  [159].  Retention  of  the  TFE3 
DNA  binding  domain  in  follicular  thyroid  carcinoma  is  another 
example  of  this,  as  TFE3  is  a  thyroid-specific  transcription  factor 
[93].  Importantly,  little  is  understood  about  the  molecular  mechan¬ 
isms  leading  to  gene  rearrangement  and  the  underlying  reasons  that 
particular  chimeras  are  formed  recurrently.  The  idea  that  tissue- 
specific  rearrangements  occur  by  fusing  highly  transcribed  genes 
holds  promise  and  would  at  least  partially  explain  the  apparent  tissue 
specificity  observed  in  the  formation  of  chimeric  transcripts  even 
between  genes  that  are  fused  in  multiple  cancer  types. 

The  idea  that  gene  fusions  are  tissue-specific  could  have  profound 
implications  on  the  discovery  of  novel  gene  fusions.  Clearly,  however, 
gene  fusions  do  not  always  confer  tissue  specificity.  HMGA2  has  a  3'- 
UTR  that  is  negatively  regulated  by  the  Let7  microRNA  and  simply 
replaces  its  3'-UTR  through  rearrangement  with  another  gene 
(described  above),  therefore  representing  a  gene  fusion  that  most 
likely  retains  functionality  in  multiple  tissue  types.  As  such,  while  this 
concept  may  have  its  largest  impact  on  underlying  molecular 
mechanisms  of  newly  discovered  gene  fusions,  it  will  probably  not 
alter  the  rate  gene  fusion  discovery. 

7.  Discovery  of  novel  gene  fusions 

Although  the  rate  recurrent  chromosomal  rearrangement  discov¬ 
ery  in  epithelial  tumors  has  been  modest,  the  recent  discovery  of  gene 
fusions  in  prostate  cancer  has  led  to  a  renewed  interest  in  gene  fusions 
identification  in  other  epithelial  cancer  subtypes.  Perhaps  the  best 
explanation  for  the  sudden  increase  in  the  characterization  of 
recurrent  gene  fusions  is  the  advent  of  novel  technologies  (Table  1). 
For  example,  the  use  of  existing  gene-expression  data  in  the  discovery 
of  novel  gene  fusions  was  limited  until  the  emergence  of  cancer 
outlier  profile  analysis  (COPA),  which  ranks  genes  by  normalizing 
expression  values  based  on  median  absolute  deviation  of  gene 
expression  to  accentuate  outlier  profiles  (reviewed  in  [130]).  When 
COPA  was  applied  to  gene-expression  datasets  in  the  Oncomine 
database  [193-196],  the  analysis  was  able  to  identify  several  hallmark 
cancer  related  genes  and  led  to  the  discovery  of  the  ERG  and  ETV1 
outlier  profiles  in  prostate  cancer  [135].  Subsequent  exon-walking 
quantitative  PCR  was  used  to  demonstrate  loss  of  the  5'  exons  in  both 


ERG  and  ETV1,  giving  rise  to  the  notion  that  a  gene  fusion  event  was 
responsible  for  the  outlier  expression  of  these  genes  in  prostate 
cancer.  Finally,  5'-RNA  ligase-mediated  rapid  amplification  of  cDNA 
ends  (5'-RACE)  was  used  to  identify  the  5'  untranslated  region  of 
TMPRSS2,  a  prostate-specific,  androgen-regulated,  transmembrane 
serine  protease  gene  [131,132,197].  Fusion  specific  PCR  and  fluores¬ 
cence  in  situ  hybridization  (FISH)  were  used  to  confirm  the  genomic 
rearrangement. 

In  contrast  to  using  COPA  and  exon-walking  quantitative  PCR  to 
identify  fusion  gene  candidates,  several  labs  are  now  employing  next 
generation  sequence  methods  wherein  DNA  or  mRNA  can  be 
fragmented,  sequenced  and  mapped  to  the  genome  in  a  matter  of 
weeks  to  identify  gene  fusions.  Various  commercial  platforms  have 
been  developed  with  the  intent  of  sequencing  as  much  of  the  genome 
or  transcriptome  as  possible  and  are  classified  based  on  the  length  of 
the  templates  each  platform  sequences.  Long  read  technologies,  like 
454,  can  sequence  long  templates  (>1  kb)  whereas  short  read 
technologies,  like  SOLEXA  and  SOLID,  are  currently  capable  of 
sequencing  35-50  nucleotide  templates.  At  first  glance,  long  read 
technologies  may  appear  to  have  the  advantage  of  making  genome  (or 
transcriptome)  re-assembly  much  simpler  than  short  read  technolo¬ 
gies.  However,  a  major  advantage  of  short  read  technologies  is  the 
depth  of  coverage,  or  the  number  of  times  a  segment  of  the  genome  is 
sequenced,  which  is  currently  much  higher  for  short  read  than  long 
read  technologies.  As  such,  the  choice  of  technology  is  still  dependent 
on  the  scientific  question. 

If  our  question  is  to  identify  the  best  method  for  novel  fusion  gene 
discovery,  we  assume  that  sequencing  the  transcriptome  space  will  be 
much  efficient  than  sequencing  cancer  genomes.  In  theory,  the 
discovery  of  gene  fusions  by  long  read  technology  will  require 
sequencing  across  the  actual  gene  fusion  boundary  of  the  chimeric 
transcript.  In  contrast,  short  read  technologies  may  be  able  to  identify 
gene  fusions  by  two  different  methods.  The  first  and  most  straight 
forward  method  is  the  identification  of  sufficient  short  reads  that  do 
not  map  directly  to  the  transcriptome,  but  correspond  to  the  gene 
fusion  boundary;  and  these  short  reads  should  identify  both 
contributing  genes  with  high  probability.  Second,  because  transcripts 
are  thought  to  be  sequenced  with  a  uniform  distribution  across  the 
length  of  the  transcript,  except  for  at  the  extreme  5'  and  3'  ends,  exon 
expression  for  each  transcript  can  be  analyzed.  Genes  involved  in 
rearrangements,  leading  to  chimeric  transcripts,  would  be  expected  to 
lack  any  exon  expression  on  one  of  the  transcript  ends.  However,  this 
method  will  need  to  be  carefully  developed,  as  mapping  of  short  reads 
to  duplicated  sequences  (or  sequences  that  appear  more  than  one 
time  in  the  genome)  remains  challenging. 
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To  test  whether  short  or  long  read  technology  was  better  for  the 
discovery  of  recurrent  gene  fusions,  we  recently  sought  to  “re¬ 
discovered”  the  known  gene  fusions  BCR-ABL1  and  TMPRSS2-ERG  by 
sequencing  the  RNA  transcriptome  of  either  the  leukemia  cell  line 
K562  or  the  prostate  cell  line  VCAP,  respectively,  with  both  short  and 
long  read  platforms  [171].  Initially  both  technologies  were  able  to 
identify  the  known  gene  fusion  from  the  sample,  but  were  also  able  to 
identify  several  other  candidate  gene  fusions.  For  example,  the 
Illumina  short  read  platform  nominated  428  candidates  from  the 
VCAP  cell  line  [171  ].  However,  most  of  these  candidates  were  likely  to 
result  from  either  trans-splicing  [198],  co-transcription  of  adjacent 
genes  followed  by  intergenic  splicing  [199],  or  as  a  consequence  of  the 
sample  preparation  protocol.  In  order  to  reduce  the  list  of  potential 
candidate  genes,  we  intersected  the  results  of  the  two  platforms  to 
yield  a  much  more  condensed  list.  Indeed  by  integrating  the  short 
read  and  long  read  platforms  rather  than  constraining  the  analysis  to 
either  short  or  long  read  technology,  we  were  able  to  significantly 
reduce  the  percent  of  false  positive  gene  fusions  discovered  [171]. 

In  the  future,  an  even  newer  adaptation  of  next  generation 
sequencing  will  likely  replace  the  current  reliance  on  both  short  and 
long  read  technologies  for  fusion  gene  discovery.  Paired  end 
sequencing  is  a  method  in  which  short  read  technology  is  used  to 
sequence  nucleotides  from  both  the  5'  and  3'  ends  of  200-300 
nucleotide  fragments  of  the  genome  (or  transcriptome).  By  sequen¬ 
cing  both  ends  of  a  fragmented  RNA,  paired  end  sequencing  enhances 
not  only  the  reliability  of  mapping  and  assembly,  but  also  maintains 
significant  sequencing  depth.  In  a  manner  similar  to  our  recent 
integration  of  short  and  long  read  platforms,  the  use  of  paired  end 
sequencing  technology  for  gene  fusion  discovery  should  first  be 
examined  by  comparing  the  ability  of  matched  mate-pairs  to  identify 
known  gene  fusions  from  control  samples.  With  paired  end  sequen¬ 
cing,  a  single  sample  preparation  and  individual  sequencing  run  will 
hopefully  provide  sufficient  coverage  for  gene  fusion  discovery  and 
these  improvements  as  well  as  other  advancements  in  modern 
sequencing  technologies  will  likewise  lead  to  a  dramatic  improve¬ 
ment  in  our  ability  to  identify  novel,  pathogenic  gene  fusions. 

8.  Lessons  from  the  JAZF1-JJAZ1  chimera 

Advances  in  sequencing  technology  will  most  likely  lead  to  a  rapid 
increase  in  the  number  of  characterized  gene  fusions  over  the  next 
few  years.  However,  a  much  more  pertinent  question  may  address  the 
reasons  for  chromosomal  rearrangements  leading  to  gene  fusions. 
Could  fusion  transcripts  be  a  part  of  normal  cell  biology?  It  is  also 
plausible  that  tissue-specific  fusions  could  impart  growth  advantages 
that  allow  a  cell  to  survive  traumatic  stress.  Nonetheless,  while  the 
underlying  molecular  mechanisms  triggering  genomic  rearrangement 
are  still  unclear;  we  surmise  that  once  a  genomic  rearrangement 
occurs,  cells  harboring  favorable  gene  fusions  will  be  selected  over 
time. 

Insight  into  the  development  of  genomic  rearrangements  may 
come  from  fundamental  observations  made  following  the  study  of 
endometrial  stromal  (EMS)  tumors.  In  2001,  a  recurrent  translocation 
t(7;17)(pl5;q21)  was  demonstrated  to  occur  in  EMS  tumors  that  led 
to  expression  of  the  chimeric  JAZF1/JJAZ1  mRNA  transcript  [200]. 
Although  the  mechanism  leading  to  this  rearrangement  remains 
unknown,  a  recent  study  demonstrated  that  trans-splicing  of  RNAs  in 
normal  human  endometrial  stromal  cells  can  lead  to  the  chimeric 
JAZF1/JJAZ1  RNA  and  protein  independent  of  chromosomal  rearran¬ 
gement  [201  ].  This  observation  suggests  that  certain  gene  fusions  may 
be  generated  by  trans-splicing  of  RNAs,  which  then  lead  to 
chromosomal  rearrangement  due  to  their  pro-neoplastic  nature. 
Interestingly,  the  group  also  demonstrated  that  the  RNA  trans-splicing 
event  leading  to  the  JAZF1/JJAZ1  chimera  was  inhibited  at  high 
concentrations  of  either  estrogen  or  progesterone,  further  suggesting 
that  certain  RNA  fusions  may  occur  in  a  hormone-dependent  manner. 


The  question  of  whether  or  not  other  specific  gene  fusions  arise  due  to 
abnormal  exposure  to  specific  hormones  has  not  been  studied. 

9.  Conclusions 

A  limited  number  of  epithelial  gene  fusions  have  been  described 
and  the  quest  for  novel  recurrent  gene  fusions,  like  the  discovery  of 
TMPRSS2-ERG  gene  fusions  in  prostate  cancer,  may  provide  major 
advances  in  cancer  research  in  the  near  future.  Here,  we  have 
demonstrated  that  gene  fusions  lead  to  overexpression  or  constitutive 
activation  of  oncogenes  by  a  variety  of  unique  mechanisms  including 
fusion  of  housekeeping  or  tissue-specific  gene  promoters  to  onco¬ 
genes,  as  in  the  case  of  TMPRSS2  gene  promoter  and  5'-UTR  to  ERG  or, 
as  in  the  case  of  HMGA2,  through  evasion  of  a  microRNA  by 
replacement  of  an  oncogene's  3'-UTR.  Despite  the  multitude  of 
mechanisms  used  by  chimeric  transcripts  to  drive  malignancy,  several 
important  lessons  can  be  taken  from  characterized  epithelial  gene 
fusions,  studies  of  MLL  translocations,  as  well  as  the  very  recent 
discovery  of  JAZF1-JJAZ1  RNA  fusions,  which  precede  genomic 
rearrangement  in  specific  cell  types. 

As  in  the  case  of  Imatinib  and  BCR-ABL1,  perhaps  the  one  of  the 
best  methods  for  interfering  with  the  development  of  specific 
malignancies  will  be  through  inhibition  of  well-characterized, 
pathogenic  fusion  genes  with  rationally  designed  molecularly  tailored 
therapies.  In  the  future,  the  use  of  both  COPA  and  high-throughput 
massively  parallel  sequencing  will  greatly  increase  the  speed  and 
reliability  of  fusion  gene  discovery  on  both  the  genomic  and 
transcriptomic  levels.  We  expect  many  more  gene  fusions  to  be 
reported  over  the  next  several  years  in  various  tumor  types,  many  of 
which  will  hopefully  serve  as  rational  drug  targets. 
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Tablel 


Prostate 

Breast 

TarqetID  NORM:  HME 

PrEC 

RWPE 

PC3 

DU145 

MCF10A 

SUM  149 

SUM  190 

BT20 

HCC1937 

hsa-miR-296-5p 

0.832594 

4.446693 

0.282801 

2.938353 

0.954421 

0.228308 

0.168524 

0.192569 

0.192236 

hsa-miR-708 

0.597511 

1.740731 

0.02235 

0.023809 

0.786263 

0.027347 

0.024853 

0.023295 

0.025006 

hsa-miR-663 

1.742116 

0.405908 

0.15265 

0.387818 

0.66306 

0.175342 

0.078522 

0.475149 

0.106728 

hsa-miR-31* 

1.43628 

1.84149 

0.659765 

0.909713 

0.627522 

0.028181 

0.021377 

0.595517 

0.986864 

hsa-miR-31 

1.105523 

1.126393 

0.910725 

0.981528 

0.614428 

0.031018 

0.023398 

0.758384 

0.972423 

Table  1 :  Summarized  lllumina  microRNA  array  bead  station  data.  microRNAs  that  were  greater  than 
2-fold  down  or  upregulated  in  both  IBC  cell  lines  SUM149  and  SUM190,  but  not  other  breast  (MCF10A, 
BT20  and  HCC1937)  or  prostate  (PrEC,  RWPE,  PC3  and  DU145)  cells  as  compared  to  HME  cells  were 
identified.  Validations  were  performed  by  qPCR  using  Taqman  probes  specific  for  both  miR-31  and 
miR-31*. 


